On 1/6/22 5:02 PM, Assaf Gordon wrote:
> Hello,
>
> On 2022-01-06 7:35 a.m., Pádraig Brady wrote:
>> Thanks for taking the time to consolidate options/functionality
>> across different implementations. This is important for users.
>> Some notes below...
>>
>> On 05/01/2022 16:23, Rob Landley wrote:
>>> Around 5 years ago toybox added the -D, -F, and -O options to cut:
>>>
>>> -D Don't sort/collate selections or match -fF lines without
>>> delimiter
>>> -F Select fields separated by DELIM regex
>>> -O Output delimiter (default one space for -F, input delim for -f)
>>>
>>
>> As I see it, the main functionalities added here:
>> - reordering of selected fields
>> - adjusted suppression of lines without matching fields
>> - regex delimiter support
>>
>> I see regex support as less important, but still useful.
>>
>
>
> Attached is a suggestion for initial implementation of "cut -FDO".
> It's split into smaller steps to ease review.
>
> The main issue is that the current "cut_fields" and "cut_bytes" are
> highly optimized for speed, so I left them as-is and created a secondary
> set of 'cut' functions - slower but with additional options.
There was a whole special case -d$'\n' in busybox to cut by line that I haven't
found any documentation for, and it looks like that was copied from coreutils...
$ echo -e 'one\ntwo\nthree\nfour\nfive' | cut -d$'\n' -f 2-3
two
three
So I'm guessing there's already more than one codepath. :)
> If this is acceptable, I'll go on to clean up the patches, add more
> tests and write documentation.
>
> There are likely some edge-cases regarding regex matching that need to
> be decided upon (e.g. BRE or ERE, what about BOL/EOL anchors, groups, etc.).
Toybox is doing ERE by default because it was introduced post-y2k:
https://github.com/landley/toybox/blob/0.8.6/toys/posix/cut.c#L217
And ignoring BRE/ERE:
https://github.com/landley/toybox/blob/0.8.6/toys/posix/cut.c#L140
because I don't see how BOL/EOL applies to delimiters _between_ elements? (Any
delimiter between first element or after last element would mean another empty
element at the edge?)
Busybox inherited both behaviors.
Thanks,
Rob