On 23/01/2021 04:58, Paul Hirst wrote:
split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.
$ split --version
split (GNU coreutils) 8.30
...
$ head -c 1000000 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4
$ ls -l
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t1
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t2
-rw-r--r-- 1 user user 250000 Jan 22 18:36 t3
-rw-r--r-- 1 user user 118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat
Surely this should not be the case?
Ugh. This functionality was broken for all files > 128KiB
due to adjustments for handling /dev/zero
$ truncate -s 1000000 test.dat
$ split --number=4/4 test.dat | wc -c
118928
The following patch fixes it here.
I need to do some more testing, before committing.
thanks!
diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf,
size_t bufsize,
}
else
{
- if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+ if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
die (EXIT_FAILURE, errno, "%s", quotef (infile));
initial_read = SIZE_MAX;
}