On 25/01/2022 07:55, Assaf Gordon wrote:
Hello,
Here's an updated patch for "cut -DF".
Since it's a new code path, it opens the possibility of finally
supporting multibyte characters with "cut -c".
comments very welcomed,
- assaf
[PATCH 01/18] cut: set-fields: add no-sort options
[PATCH 02/18] cut: iniitial -D implmentation, currently only with
[PATCH 03/18] tests: add 'cut -D' tests
[PATCH 04/18] cut: extract 'cut -D -f' to a separate function
[PATCH 05/18] cut: implement -D with -b
[PATCH 06/18] tests: add 'cut -D -b' tests
[PATCH 07/18] cut: add -O short-option for --output-delimiter
[PATCH 08/18] cut: implement -F
[PATCH 09/18] tests: add 'cut -F' tests
[PATCH 10/18] cut: extract cut-fields into separate functions
[PATCH 11/18] cut: implement multibyte -c/--characters
[PATCH 12/18] cut: change -F regex syntax to BRE
[PATCH 13/18] cut: change -D long-option equivalent
[PATCH 14/18] doc: mention 'cut -D' in NEWS
[PATCH 15/18] doc: mention 'cut -F' in NEWS
[PATCH 16/18] doc: mention 'cut -O' in NEWS
[PATCH 17/18] doc: mention multibyte 'cut -c' in NEWS
[PATCH 18/18] doc: expand 'cut' section
Looking great!
Some initial notes...
0008
I was surprised that -D implies -s
(as there is no way to undo that without another option).
I'm even more surprised that -F implies -s
Fair enough for compat with existing implementations,
but please double check the -F behavior on existing implementations.
0011
nice, following your expr multi-byte work in v8.27-47-ga9f2be5bf
How are encoding errors handled?
0012
Interesting you used [ \t][ \t]* to match a run of whitespace
while I would have used [ \t]\{1,\}. Using grep indicates
there is no perf difference at least.
0018
s/and remove/and removes/
minor point to change:
printf "blah" | ...
to:
printf 'blah' | ...
thanks!
Pádraig