On 25/01/2022 07:55, Assaf Gordon wrote:
Hello,

Here's an updated patch for "cut -DF".
Since it's a new code path, it opens the possibility of finally
supporting multibyte characters with "cut -c".


comments very welcomed,
   - assaf

   [PATCH 01/18] cut: set-fields: add no-sort options
   [PATCH 02/18] cut: iniitial -D implmentation, currently only with
   [PATCH 03/18] tests: add 'cut -D' tests
   [PATCH 04/18] cut: extract 'cut -D -f' to a separate function
   [PATCH 05/18] cut: implement -D with -b
   [PATCH 06/18] tests: add 'cut -D -b' tests
   [PATCH 07/18] cut: add -O short-option for --output-delimiter
   [PATCH 08/18] cut: implement -F
   [PATCH 09/18] tests: add 'cut -F' tests
   [PATCH 10/18] cut: extract cut-fields into separate functions
   [PATCH 11/18] cut: implement multibyte -c/--characters
   [PATCH 12/18] cut: change -F regex syntax to BRE
   [PATCH 13/18] cut: change -D long-option equivalent
   [PATCH 14/18] doc: mention 'cut -D' in NEWS
   [PATCH 15/18] doc: mention 'cut -F' in NEWS
   [PATCH 16/18] doc: mention 'cut -O' in NEWS
   [PATCH 17/18] doc: mention multibyte 'cut -c' in NEWS
   [PATCH 18/18] doc: expand 'cut' section

Looking great!
Some initial notes...

0008
  I was surprised that -D implies -s
  (as there is no way to undo that without another option).
  I'm even more surprised that -F implies -s
  Fair enough for compat with existing implementations,
  but please double check the -F behavior on existing implementations.

0011
  nice, following your expr multi-byte work in v8.27-47-ga9f2be5bf
  How are encoding errors handled?

0012
  Interesting you used [ \t][ \t]* to match a run of whitespace
  while I would have used [ \t]\{1,\}. Using grep indicates
  there is no perf difference at least.

0018
 s/and remove/and removes/

 minor point to change:
   printf "blah" | ...
 to:
   printf 'blah' | ...

thanks!
Pádraig

Reply via email to