David Krider <[EMAIL PROTECTED]> wrote: > On Wed, 2004-05-19 at 21:57, David Krider wrote: > >> > cat dump.txt|cut -c 1-69|sort -u|cut --output-delimiter=\| -c >> 1-17,19-32,34-52,54- >> 10165301 |M CPC |TABAR, FR CORVETTE |4105128 >> 4656102 RI|HRYSLER |TABAR, FR 01 PT |656102 >> 4694750 |HRYSLER |TABAR, FR NS |VX-2513-H >> 52088126 |HRYSLER |OIL FR TJ |2088119AB >> 52088127 |HRYSLER |OIL FR TJ |2088119AB >> 52088128 |HRYSLER |OIL FR TJ |2088119AB >> 52088129 |HRYSLER |OIL FR TJ |2088119AB >> F65A-5B326-CA |ORD |-BAR, FR |MTB-001 > > I joined the list only to ask about this situation. No one responded, so > I'll try one more time. Is the changed behavior in cut (with an output > delimiter) a bug or a feature?
It's a bug. Thanks for the report. I've just checked in the fix below. 2004-06-02 Jim Meyering <[EMAIL PROTECTED]> Fix a bug in how the --output-delimiter=D option works with abutting byte or character ranges. Reported by David Krider in http://lists.gnu.org/archive/html/bug-coreutils/2004-05/msg00132.html * src/cut.c (print_kth): Remove special case for open-ended range. (set_fields): Record the range start index for an interval even when it abuts another interval on its low side. Also record the range start index of the longest right-open-interval. * tests/cut/Test.pm: Add tests of --output-delimiter=S with abutting and overlapping byte ranges. * doc/coreutils.texi (cut invocation): Clarify what --output-delimiter=STR does with byte/character ranges. Index: cut.c =================================================================== RCS file: /fetish/cu/src/cut.c,v retrieving revision 1.111 diff -u -p -r1.111 cut.c --- cut.c 17 May 2004 13:16:53 -0000 1.111 +++ cut.c 2 Jun 2004 11:40:27 -0000 @@ -266,14 +266,8 @@ is_range_start_index (size_t i) static bool print_kth (size_t k, bool *range_start) { - if (0 < eol_range_start && eol_range_start <= k) - { - if (range_start) - *range_start = (k == eol_range_start); - return true; - } - - if (k <= max_range_endpoint && is_printable_field (k)) + if ((0 < eol_range_start && eol_range_start <= k) + || (k <= max_range_endpoint && is_printable_field (k))) { if (range_start) *range_start = is_range_start_index (k); @@ -473,25 +467,35 @@ set_fields (const char *fieldstr) if (output_delimiter_specified) { - /* Record the range-start indices. */ - for (i = 0; i < n_rp; i++) + /* Record the range-start indices, i.e., record each start + index that is not part of any other (lo..hi] range. */ + for (i = 0; i <= n_rp; i++) { size_t j; - for (j = rp[i].lo; j <= rp[i].hi; j++) + size_t rsi = (i < n_rp ? rp[i].lo : eol_range_start); + + for (j = 0; j < n_rp; j++) { - if (0 < j && is_printable_field (j) - && !is_printable_field (j - 1)) + if (rp[j].lo < rsi && rsi <= rp[j].hi) { - /* Record the fact that `j' is a range-start index. */ - void *ent_from_table = hash_insert (range_start_ht, - (void*) j); - if (ent_from_table == NULL) - { - /* Insertion failed due to lack of memory. */ - xalloc_die (); - } - assert ((size_t) ent_from_table == j); + rsi = 0; + break; + } + } + + if (eol_range_start && eol_range_start < rsi) + rsi = 0; + + if (rsi) + { + /* Record the fact that `rsi' is a range-start index. */ + void *ent_from_table = hash_insert (range_start_ht, (void*) rsi); + if (ent_from_table == NULL) + { + /* Insertion failed due to lack of memory. */ + xalloc_die (); } + assert ((size_t) ent_from_table == rsi); } } } Index: Test.pm =================================================================== RCS file: /fetish/cu/tests/cut/Test.pm,v retrieving revision 1.13 diff -u -p -r1.13 Test.pm --- Test.pm 23 Jul 2003 07:01:19 -0000 1.13 +++ Test.pm 2 Jun 2004 11:41:04 -0000 @@ -85,6 +85,13 @@ my @tv = ( ['out-delim5', '-c2-3,4- --output-d=:', "abcdefg\n", "bc:defg\n", 0], # This test would fail for cut from coreutils-5.0.1 and earlier. ['out-delim6', '-c2,1-3 --output-d=:', "abc\n", "abc\n", 0], +# +['od-abut', '-b1-2,3-4 --output-d=:', "abcd\n", "ab:cd\n", 0], +['od-overlap', '-b1-2,2 --output-d=:', "abc\n", "ab\n", 0], +['od-overlap2', '-b1-2,2- --output-d=:', "abc\n", "abc\n", 0], +['od-overlap3', '-b1-3,2- --output-d=:', "abcd\n", "abcd\n", 0], +['od-overlap4', '-b1-3,2-3 --output-d=:', "abcd\n", "abc\n", 0], +['od-overlap5', '-b1-3,1-4 --output-d=:', "abcde\n", "abcd\n", 0], ); Index: coreutils.texi =================================================================== RCS file: /fetish/cu/doc/coreutils.texi,v retrieving revision 1.184 diff -u -p -r1.184 coreutils.texi --- coreutils.texi 2 Jun 2004 08:35:02 -0000 1.184 +++ coreutils.texi 2 Jun 2004 21:22:47 -0000 @@ -4428,7 +4428,8 @@ With @option{-f}, output fields are sepa The default with @option{-f} is to use the input delimiter. When using @option{-b} or @option{-c} to select ranges of byte or character offsets (as opposed to ranges of fields), -output @var{output_delim_string} between ranges of selected bytes. +output @var{output_delim_string} between non-overlapping +ranges of selected bytes. @end table _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/bug-coreutils