For reference, and I don't know if this is the same bug or just related, but here's the original bug I ran into:
-- >8 --
$ rm xx*; seq 999 | csplit - /10/-5 30 /10/-5 {2} %10% 110
8
70
195
3
3
28
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4
==> xx01 <==
5
6
7
8
9
10
...
29
==> xx02 <==
30
...
94
==> xx03 <==
95
==> xx04 <==
96
==> xx05 <==
103
104
105
106
107
108
109
==> xx06 <==
110
...
999
-- >8 --
Where'd 100..102 gone?
Compare s:%:/:g:
-- >8 --
$ rm xx*; seq 999 | csplit - /10/-5 30 /10/-5 {2} /10/ 110
8
70
195
3
3
21
28
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4
==> xx01 <==
5
6
7
8
9
10
...
29
==> xx02 <==
30
...
94
==> xx03 <==
95
==> xx04 <==
96
==> xx05 <==
97
98
99
100
101
102
==> xx06 <==
103
104
105
106
107
108
109
==> xx07 <==
110
...
999
-- >8 --
And compare my diagram:
/10/-5 30 /10/-5 {2} %10% 110 expr
0 1 2 rep
1-4 5-29 30-94 95 96 97-99 100-109 110-999 line
00 01 02 03 04 05 06 07 file
When the "10" regex is %-wrapped, file 05 is not allocated, as expected.
POSIX leaves what happens when applying an expression would leave a
zero-sized file unspecified, which is why it's legal for coreutils
csplit to always eject a line; for comparison, NetBSD &a. csplit
creates 2 empty files for the /10/-5 {2} expression, for obvious reasons.
Consider therefore the same diagram but vertical
(annotation signifying file start):
-- >8 --
1 xx00
2
3
4
5 xx01
6
7
8
9
10
...
29
30 xx02
...
94
95 xx03
96 xx04
97 xx05
98
99
100 xx06
101
102
103
104
105
106
107
108
109
110 xx07
...
999
-- >8 --
Since in the csplit language, for a constant input, all expressions sans
%expr% can be reduced to line number expressions¹,
here's the equivalent invocation:
-- >8 --
$ rm xx*; seq 999 | csplit - 5 30 95 96 97 %10% 110
8
70
195
3
3
40
3560
$ head -n99999 xx*
==> xx00 <==
1
2
3
4
==> xx01 <==
5
6
7
8
9
10
...
29
==> xx02 <==
30
...
94
==> xx03 <==
95
==> xx04 <==
96
==> xx05 <==
100
101
102
103
104
105
106
107
108
109
==> xx06 <==
110
...
999
-- >8 --
So, again, this appears to be lookbehind rearing its filthy head again.
Best,
наб
¹ I don't think this is strictly true for the strict POSIX dialect
without a forward-progress hatch, like NetBSD, but it is true for the
coreutils dialect where a regex expression always ejects a line.
Whatever, you get the point.
signature.asc
Description: PGP signature

