I got a bit of time for the review last night... This was your last interface change for this:
-b, --bytes=SIZE put SIZE bytes per output file\n\ + -b, --bytes=/N generate N output files\n\ + -b, --bytes=K/N print Kth of N chunks of file\n\ -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file\n\ -d, --numeric-suffixes use numeric suffixes instead of alphabetic\n\ -l, --lines=NUMBER put NUMBER lines per output file\n\ + -l, --lines=/N generate N eol delineated output files\n\ + -l, --lines=K/N print Kth of N eol delineated chunks\n\ + -n, --number=N same as --bytes=/N\n\ + -n, --number=K/N same as --bytes=K/N\n\ + -r, --round-robin=N generate N eol delineated output files using\n\ + round-robin style distribution.\n\ + -r. --round-robin=K/N print Kth of N eol delineated chunk as -rN would\n\ + have generated.\n\ + -t, --term=CHAR specify CHAR as eol. This will also convert\n\ + -b to its line delineated equivalent (-C if\n\ + splitting normally, -l if splitting by\n\ + chunks). C escape sequences are accepted.\n\ Thinking more about it, I think adding 2 modes of operation to the already slightly complicated -bCl options is too confusing. Since this is a separate mode of operation; i.e. one would be specifying a particular number of files for a different reason than a particular size, it would be better as a separate option. So I changed -n to operate as follows. This is more general if we want to add new split methods in future, and also compatible with the existing BSD -n without needing a redundant option. -n N split into N files based on size of input -n K/N output K of N to stdout -n l/N split into N files while maintaining lines -n l/K/N output K of N to stdout while maintaining lines -n r/N like `l' but use round robin distribution instead of size -n r/K/N likewise but only output K of N to stdout Other changes I made in the attached version are: Removed the -t option as that's separate. Removed erroneous 'c' from getopt() parameters. Use K/N in code rather than M/N to match user instructions. Added suffix len setter/checker based on N so that we fail immediately if the wrong -a is specified, or if -a is not specified we auto set it. Flagged 0/N as an error, rather than treating like /N. Changed r/K/N to buffer using stdio for much better performance (see below) Fixed up the errno on some errors() Normalized all "write error" messages so that all of these commands output a single translated error message, of the form: "split: write error: No space left on device" split -n 1/10 $(which split) >/dev/full stdbuf -o0 split -n 1/10 $(which split) >/dev/full seq 10 | split -n r/1/10 >/dev/full seq 10 | stdbuf -o0 split -n r/1/10 >/dev/full Re the performance of the round robin implementation; using stdio helps a LOT as can be seen with: ------------------------------------------------------- $ time yes | head -n10000000 | ./split-fwrite -n r/1/1 | wc -l 10000000 real 0m1.568s user 0m1.486s sys 0m0.072s $ time yes | head -n10000000 | ./split-write -n r/1/1 | wc -l 10000000 real 0m50.988s user 0m7.548s sys 0m43.250s ------------------------------------------------------- I still need to look at the round robin implementation when outputting to file rather than stdout. I may default to using stdio, but give an option to flush each line. I'm testing with this currently which is performing badly just doing write() ------------------------------------------------------- #create fifos yes | head -n4 | ../split -n r/4 fifo for f in x*; do rm $f && mkfifo $f; done #consumer (for f in x*; do md5sum $f& done) > md5sum.out #producer seq 100000 | split -n r/4 ------------------------------------------------------- BTW, other modes perform well with write() ------------------------------------------------------- $ yes | head -n10000000 > 10m.txt $ time ./split -n l/1/1 <10m.txt | wc -l 10000000 real 0m0.201s user 0m0.145s sys 0m0.043s $ time ./split -n 1/1 <10m.txt | wc -l 10000000 real 0m0.199s user 0m0.154s sys 0m0.041s $ time ./split -n 1 <10m.txt real 0m0.088s user 0m0.000s sys 0m0.081s ------------------------------------------------------- Here is stuff I intend TODO before checking in: s/pread()/dd::skip()/ or at least add pread to bootstrap.conf fix info docs for reworked interface try to refactor duplicated code cheers, Pádraig.
>From 8a7fe06170ad8bb3050c4b6a43c9e51eb0ec22a7 Mon Sep 17 00:00:00 2001 From: Chen Guo <cheng...@yahoo.com> Date: Fri, 8 Jan 2010 03:42:27 -0800 Subject: [PATCH] split: add --number to generate a particular number of files * doc/coreutils.texi: update documentation of split. * src/split.c (usage, long_options, main): New options --number. (set_suffix_length): New function to auto increase suffix length to handle a specified number of files. (bytes_split): add max_files argument. This allows for trivial implementaton for byte chunking, similar to BSD. (lines_chunk_split): new function. Split file into chunks of lines. (bytes_chunk_extract): new function. Extract a chunk of file. (lines_chunk_extract): new function. Extract a chunk of lines. (of_info): new struct. Used by new functions lines_rr and ofd_check to keep track of file descriptors associated with output files. (ofd_check): new function. Shuffle file descriptors in case output files out number available file descriptors. (lines_rr): new function. Split file into chunks in round-robin fashion. (lines_rr_extract): new function. Extract a chunk of file, as if chunks were created in round-robin fashion. (chunk_parse): new function. Parses /N and K/N syntax. * tests/Makefile.am: add new tests. * misc/split-bchunk: new test for byte delineated chunking. * misc/split-fail: add failure scenarios for new options. * misc/split-l: change typo ln --version to split --version. * misc/split-lchunk: new test for line delineated chunking. * misc/split-rchunk: new test for round-robin chunking. --- doc/coreutils.texi | 48 ++++- src/split.c | 459 ++++++++++++++++++++++++++++++++++++++++++++++- tests/Makefile.am | 3 + tests/misc/split-bchunk | 46 +++++ tests/misc/split-fail | 3 +- tests/misc/split-l | 2 +- tests/misc/split-lchunk | 56 ++++++ tests/misc/split-rchunk | 53 ++++++ 8 files changed, 649 insertions(+), 21 deletions(-) create mode 100755 tests/misc/split-bchunk create mode 100755 tests/misc/split-lchunk create mode 100755 tests/misc/split-rchunk diff --git a/doc/coreutils.texi b/doc/coreutils.texi index e3e95f5..41b02be 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -104,7 +104,7 @@ * shuf: (coreutils)shuf invocation. Shuffling text files. * sleep: (coreutils)sleep invocation. Delay for a specified time. * sort: (coreutils)sort invocation. Sort text files. -* split: (coreutils)split invocation. Split into fixed-size pieces. +* split: (coreutils)split invocation. Split into pieces. * stat: (coreutils)stat invocation. Report file(system) status. * stdbuf: (coreutils)stdbuf invocation. Modify stdio buffering. * stty: (coreutils)stty invocation. Print/change terminal settings. @@ -2623,7 +2623,7 @@ These commands output pieces of the input. @menu * head invocation:: Output the first part of files. * tail invocation:: Output the last part of files. -* split invocation:: Split a file into fixed-size pieces. +* split invocation:: Split a file into pieces. * csplit invocation:: Split a file into context-determined pieces. @end menu @@ -2919,15 +2919,15 @@ mean either @samp{tail ./+4} or @samp{tail -n +4}. @node split invocation -...@section @command{split}: Split a file into fixed-size pieces +...@section @command{split}: Split a file into pieces. @pindex split @cindex splitting a file into pieces @cindex pieces, splitting a file into -...@command{split} creates output files containing consecutive sections of -...@var{input} (standard input if none is given or @var{input} is -...@samp{-}). Synopsis: +...@command{split} creates output files containing consecutive or interleaved +sections of @var{input} (standard input if none is given or @var{input} +is @samp{-}). Synopsis: @example split [...@var{option}] [...@var{input} [...@var{prefix}]] @@ -2940,10 +2940,9 @@ left over for the last section), into each output file. The output files' names consist of @var{prefix} (@samp{x} by default) followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by default), such that concatenating the output files in traditional -sorted order by file name produces -the original input file. If the output file names are exhausted, -...@command{split} reports an error without deleting the output files -that it did create. +sorted order by file name produces the original input file (except +...@option{-r}). If the output file names are exhausted, @command{split} +reports an error without deleting the output files that it did create. The program accepts the following options. Also see @ref{Common options}. @@ -2959,6 +2958,13 @@ For compatibility @command{split} also supports an obsolete option syntax @optio...@var{lines}}. New scripts should use @option{-l @var{lines}} instead. +...@item -l [...@var{k}]/@var{chunks} +...@item --line...@var{k}]/@var{chunks} +If @var{k} is zero or omitted, divide @var{input} into @var{chunks} +roughly equal-sized line delineated chunks. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + @item -b @var{size} @itemx --byt...@var{size} @opindex -b @@ -2966,6 +2972,13 @@ option syntax @optio...@var{lines}}. New scripts should use @option{-l Put @var{size} bytes of @var{input} into each output file. @multiplierSuffixes{size} +...@item -b [...@var{k}]/@var{chunks} +...@itemx --byte...@var{k}]/@var{chunks} +If @var{k} is zero or omitted, divide @var{input} into @var{chunks} +equal-sized chunks. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + @item -C @var{size} @itemx --line-byt...@var{size} @opindex -C @@ -2975,6 +2988,21 @@ possible without exceeding @var{size} bytes. Individual lines longer than @var{size} bytes are broken into multiple files. @var{size} has the same format as for the @option{--bytes} option. +...@item -n [...@var{k}]/]...@var{chunks} +...@itemx --number [...@var{k}]/]...@var{chunks} +...@opindex -n +...@opindex --number +Same as @option{--byte...@var{k}]/@var{chunks}}, for BSD compatibility. + +...@item -r [...@var{k}]/]...@var{chunks} +...@itemx --round-robin [...@var{k}]/]...@var{chunks} +...@opindex -r +...@opindex --round-robin +If @var{k} is zero or omitted, distribute @var{input} lines round-robin +style into @var{chunks} output files. + +If @var{k} is present and nonzero, print @var{k}th of such chunks. + @item -a @var{length} @itemx --suffix-leng...@var{length} @opindex -a diff --git a/src/split.c b/src/split.c index 5bd9ebb..83c127a 100644 --- a/src/split.c +++ b/src/split.c @@ -44,8 +44,6 @@ proper_name_utf8 ("Torbjorn Granlund", "Torbj\303\266rn Granlund"), \ proper_name ("Richard M. Stallman") -#define DEFAULT_SUFFIX_LENGTH 2 - /* Base name of output files. */ static char const *outbase; @@ -57,7 +55,7 @@ static char *outfile; static char *outfile_mid; /* Length of OUTFILE's suffix. */ -static size_t suffix_length = DEFAULT_SUFFIX_LENGTH; +static size_t suffix_length; /* Alphabet of characters to use in suffix. */ static char const *suffix_alphabet = "abcdefghijklmnopqrstuvwxyz"; @@ -84,6 +82,7 @@ static struct option const longopts[] = {"bytes", required_argument, NULL, 'b'}, {"lines", required_argument, NULL, 'l'}, {"line-bytes", required_argument, NULL, 'C'}, + {"number", required_argument, NULL, 'n'}, {"suffix-length", required_argument, NULL, 'a'}, {"numeric-suffixes", no_argument, NULL, 'd'}, {"verbose", no_argument, NULL, VERBOSE_OPTION}, @@ -92,6 +91,32 @@ static struct option const longopts[] = {NULL, 0, NULL, 0} }; +static void +set_suffix_length (size_t n_units) +{ +#define DEFAULT_SUFFIX_LENGTH 2 + + size_t suffix_needed = 0; + size_t alphabet_len = strlen (suffix_alphabet); + bool alphabet_slop = (n_units % alphabet_len) != 0; + while (n_units /= alphabet_len) + suffix_needed++; + suffix_needed += alphabet_slop; + + if (suffix_length) /* set by user */ + { + if (suffix_length < suffix_needed) + { + error (EXIT_FAILURE, 0, + _("the suffix length needs to be at least %zu"), + suffix_needed); + } + return; + } + else + suffix_length = MAX (DEFAULT_SUFFIX_LENGTH, suffix_needed); +} + void usage (int status) { @@ -119,6 +144,7 @@ Mandatory arguments to long options are mandatory for short options too.\n\ -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file\n\ -d, --numeric-suffixes use numeric suffixes instead of alphabetic\n\ -l, --lines=NUMBER put NUMBER lines per output file\n\ + -n, --number=CHUNKS generate CHUNKS output files. See below\n\ "), DEFAULT_SUFFIX_LENGTH); fputs (_("\ --verbose print a diagnostic just before each\n\ @@ -127,6 +153,15 @@ Mandatory arguments to long options are mandatory for short options too.\n\ fputs (HELP_OPTION_DESCRIPTION, stdout); fputs (VERSION_OPTION_DESCRIPTION, stdout); emit_size_note (); +fputs (_("\n\ +CHUNKS may be:\n\ +N split into N files based on size of input\n\ +K/N output K of N to stdout\n\ +l/N split into N files while maintaining lines\n\ +l/K/N output K of N to stdout while maintaining lines\n\ +r/N like `l' but use round robin distribution instead of size\n\ +r/K/N likewise but only output K of N to stdout\n\ +"), stdout); emit_ancillary_info (); } exit (status); @@ -218,13 +253,14 @@ cwrite (bool new_file_flag, const char *bp, size_t bytes) Use buffer BUF, whose size is BUFSIZE. */ static void -bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize) +bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize, uintmax_t max_files) { size_t n_read; bool new_file_flag = true; size_t to_read; uintmax_t to_write = n_bytes; char *bp_out; + uintmax_t opened = 1; do { @@ -251,7 +287,7 @@ bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize) cwrite (new_file_flag, bp_out, w); bp_out += w; to_read -= w; - new_file_flag = true; + new_file_flag = !max_files || (opened++ < max_files); to_write = n_bytes; } } @@ -362,6 +398,329 @@ line_bytes_split (size_t n_bytes) free (buf); } +/* Split into NUMBER chunks of lines. */ + +static void +lines_chunk_split (size_t number, char *buf, size_t bufsize, size_t file_size) +{ + size_t n_read; + size_t chunk_no = 1; + off_t chunk_end = file_size / number - 1; + off_t offset = 0; + bool new_file_flag = true; + char *bp, *bp_out, *eob; + + while (offset < file_size) + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + bp = buf; + eob = buf + n_read; + + while (1) + { + /* Begin looking for '\n' at last byte of chunk. */ + bp_out = (offset < chunk_end) ? bp + chunk_end - offset : bp; + if (bp_out > eob) + bp_out = eob; + bp_out = memchr (bp_out, '\n', eob - bp_out); + if (!bp_out) + { + /* Buffer exhausted. */ + cwrite (new_file_flag, bp, eob - bp); + new_file_flag = false; + offset += eob - bp; + break; + } + else + bp_out++; + + cwrite (new_file_flag, bp, bp_out - bp); + chunk_end = (++chunk_no < number) ? + chunk_end + file_size / number : file_size; + new_file_flag = true; + offset += bp_out - bp; + bp = bp_out; + /* A line could have been so long that it skipped + entire chunks. */ + while (chunk_end < offset) + { + chunk_end += file_size / number; + chunk_no++; + /* Create blank file: this ensures NUMBER files are + created. */ + cwrite (true, bp, 0); + } + } + } +} + +/* Extract Nth of TOTAL chunks. */ + +static void +bytes_chunk_extract (size_t n, size_t total, char *buf, size_t bufsize, + size_t file_size) +{ + off_t start = (n == 0) ? 0 : (n - 1) * (file_size / total); + off_t end = (n == total) ? file_size : n * (file_size / total); + ssize_t n_read; + size_t n_write; + + while (1) + { + n_read = pread (STDIN_FILENO, buf, bufsize, start); + if (n_read < 0) + error (EXIT_FAILURE, errno, "%s", infile); + n_write = (start + n_read <= end) ? n_read : end - start; + if (full_write (STDOUT_FILENO, buf, n_write) != n_write) + error (EXIT_FAILURE, errno, "%s", _("write error")); + start += n_read; + if (end <= start) + return; + } +} + +/* Extract lines whose first byte is in the Nth of TOTAL chunks. */ + +static void +lines_chunk_extract (size_t n, size_t total, char *buf, size_t bufsize, + size_t file_size) +{ + ssize_t n_read; + bool end_of_chunk = false; + bool skip = true; + char *bp = buf, *bp_out = buf, *eob; + off_t start; + off_t end; + + /* For n != 1, start reading 1 byte before nth chunk of file. This is to + detect if the first byte of chunk is the first byte of a line. */ + if (n == 1) + { + start = 0; + skip = false; + } + else + start = (n - 1) * (file_size / total) - 1; + end = (n == total) ? file_size - 1 : n * (file_size / total) - 1; + + do + { + n_read = pread (STDIN_FILENO, buf, bufsize, start); + if (n_read < 0) + error (EXIT_FAILURE, errno, "%s", infile); + bp = buf; + bp_out = buf + n_read; + eob = bp_out; + + /* Find starting point. */ + if (skip) + { + bp = memchr (buf, '\n', n_read); + if (bp && bp - buf < end - start) + { + bp++; + skip = false; + } + else if (!bp && start + n_read < end) + { + start += n_read; + continue; + } + else + return; + } + + /* Find ending point. */ + if (end < start + n_read && end == file_size - 1) + end_of_chunk = true; + else if (start + n_read >= end) + { + bp_out = (buf + end - start < buf) ? buf : buf + end - start; + bp_out = memchr (bp_out, '\n', eob - bp_out); + if (bp_out) + { + bp_out++; + end_of_chunk = true; + } + else + bp_out = eob; + } + + if (write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp) + error (EXIT_FAILURE, errno, _("write error")); + start += n_read; + } + while (!end_of_chunk); +} + + + +typedef struct of_info +{ + char *of_name; + int ofd; +} of_t; + +/* Rotates file descriptors when we're writing to more output files than we + have available file descriptors. */ + +static void +ofd_check (of_t * ofiles, size_t i, size_t n) +{ + if (0 < ofiles[i].ofd) + return; + else + { + int fd; + int j = i - 1; + + /* Another process could have opened a file in between the calls to + close and open, so we should keep trying until open succeeds or + we've closed all of our files. */ + while (1) + { + /* Attempt to open file. */ + fd = open (ofiles[i].of_name, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP + | S_IROTH | S_IWOTH)); + if (-1 < fd) + break; + /* Find an open file to close. */ + while (ofiles[j].ofd < 0) + { + if (--j == 0) + j = n - 1; + /* No more open files to close, exit with failure. */ + if (j == i) + error (EXIT_FAILURE, EMFILE, "%s", ofiles[i].of_name); + } + close (ofiles[j].ofd); + } + ofiles[i].ofd = fd; + } +} + +/* Divide file into N chunks in round robin fashion. */ + +static void +lines_rr (size_t n, char *buf, size_t bufsize) +{ + of_t *ofiles = xnmalloc (n, sizeof *ofiles); + char *bp, *bp_out, *eob; + size_t n_read; + bool eof = false; + bool nextfile = false; + size_t i; + + /* Generate output file names. */ + for (i = 0; i < n; i++) + { + next_file_name (); + ofiles[i].of_name = xstrdup (outfile); + ofiles[i].ofd = -1; + } + i = 0; + + do + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + if (n_read < bufsize) + { + if (n_read == 0) + break; + eof = true; + } + bp = buf; + eob = buf + n_read; + + + while (bp != eob) + { + /* Find end of line. */ + bp_out = memchr (bp, '\n', eob - bp); + if (bp_out) + { + bp_out++; + nextfile = true; + } + else + bp_out = eob; + + /* Secure file descriptor. */ + ofd_check (ofiles, i, n); + + if (full_write (ofiles[i].ofd, bp, bp_out - bp) != bp_out - bp) + error (EXIT_FAILURE, errno, "%s", ofiles[i].of_name); + if (nextfile && ++i == n) + i = 0; + bp = bp_out; + nextfile = false; + } + } + while (!eof); + + /* Close any open file descriptors. */ + for (i = 0; i < n; i++) + if (-1 < ofiles[i].ofd) + close (ofiles[i].ofd); +} + +/* Extract Nth of TOT round robin distributed chunks of lines */ + +static void +lines_rr_extract (uintmax_t n, uintmax_t tot, char *buf, size_t bufsize) +{ + int line_no = 1; + char *bp, *bp_out, *eob; + size_t n_read; + bool eof = false; + bool inc = false; + + do + { + n_read = full_read (STDIN_FILENO, buf, bufsize); + if (n_read == SAFE_READ_ERROR) + error (EXIT_FAILURE, errno, "%s", infile); + if (n_read != bufsize) + { + if (n_read == 0) + break; + eof = true; + } + bp = buf; + eob = buf + n_read; + + while (bp != eob) + { + /* Find end of line. */ + bp_out = memchr (bp, '\n', eob - bp); + if (bp_out) + { + bp_out++; + inc = true; + } + else + bp_out = eob; + + if (line_no == n && fwrite (bp, bp_out - bp, 1, stdout) != 1) + { + clearerr (stdout); /* So close_stdout() doesn't also print. */ + error (EXIT_FAILURE, errno, _("write error")); + } + if (inc) + line_no = (line_no == tot) ? 1 : line_no + 1; + bp = bp_out; + inc = false; + } + } + while (!eof); +} + #define FAIL_ONLY_ONE_WAY() \ do \ { \ @@ -370,21 +729,47 @@ line_bytes_split (size_t n_bytes) } \ while (0) +/* Parse K/N syntax of chunk options. */ + +static void +chunk_parse (uintmax_t *k_units, uintmax_t *n_units, char *slash) +{ + *slash = '\0'; + if (slash != optarg /* a leading number is specified. */ + && (xstrtoumax (optarg, NULL, 10, k_units, "") != LONGINT_OK + || *k_units == 0 || SIZE_MAX < *k_units)) + { + error (0, 0, _("%s: invalid chunk number"), optarg); + usage (EXIT_FAILURE); + } + if (xstrtoumax (++slash, NULL, 10, n_units, "") != LONGINT_OK + || *n_units == 0 || *n_units < *k_units || SIZE_MAX < *n_units) + { + error (0, 0, _("%s: invalid number of chunks"), slash); + usage (EXIT_FAILURE); + } +} + + int main (int argc, char **argv) { struct stat stat_buf; enum { - type_undef, type_bytes, type_byteslines, type_lines, type_digits + type_undef, type_bytes, type_byteslines, type_lines, type_digits, + type_chunk_bytes, type_chunk_lines, type_rr } split_type = type_undef; size_t in_blk_size; /* optimal block size of input file device */ char *buf; /* file i/o buffer */ size_t page_size = getpagesize (); + uintmax_t k_units = 0; uintmax_t n_units; static char const multipliers[] = "bEGKkMmPTYZ0"; int c; int digits_optind = 0; + size_t file_size; + char *slash; initialize_main (&argc, &argv); set_program_name (argv[0]); @@ -404,7 +789,7 @@ main (int argc, char **argv) /* This is the argv-index of the option we will read next. */ int this_optind = optind ? optind : 1; - c = getopt_long (argc, argv, "0123456789C:a:b:dl:", longopts, NULL); + c = getopt_long (argc, argv, "0123456789C:a:b:dl:n:", longopts, NULL); if (c == -1) break; @@ -459,6 +844,34 @@ main (int argc, char **argv) } break; + case 'n': + if (split_type != type_undef) + FAIL_ONLY_ONE_WAY (); + /* skip any whitespace */ + while (isspace (to_uchar (*optarg))) + optarg++; + if (strncmp (optarg, "r/", 2) == 0) + { + split_type = type_rr; + optarg += 2; + } + else if (strncmp (optarg, "l/", 2) == 0) + { + split_type = type_chunk_lines; + optarg += 2; + } + else + split_type = type_chunk_bytes; + if ((slash = strchr (optarg, '/'))) + chunk_parse (&k_units, &n_units, slash); + else if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK + || n_units == 0 || SIZE_MAX < n_units) + { + error (0, 0, _("%s: invalid number of chunks"), optarg); + usage (EXIT_FAILURE); + } + break; + case '0': case '1': case '2': @@ -514,10 +927,12 @@ main (int argc, char **argv) if (n_units == 0) { - error (0, 0, _("invalid number of lines: 0")); + error (0, 0, _("%s: invalid number of lines"), "0"); usage (EXIT_FAILURE); } + set_suffix_length (n_units); + /* Get out the filename arguments. */ if (optind < argc) @@ -550,6 +965,11 @@ main (int argc, char **argv) if (fstat (STDIN_FILENO, &stat_buf) != 0) error (EXIT_FAILURE, errno, "%s", infile); in_blk_size = io_blksize (stat_buf); + file_size = stat_buf.st_size; + + if (split_type == type_chunk_bytes || split_type == type_chunk_lines) + if (file_size < n_units) + error (EXIT_FAILURE, 0, _("number of chunks exceed file size")); buf = ptr_align (xmalloc (in_blk_size + 1 + page_size - 1), page_size); @@ -561,13 +981,34 @@ main (int argc, char **argv) break; case type_bytes: - bytes_split (n_units, buf, in_blk_size); + bytes_split (n_units, buf, in_blk_size, 0); break; case type_byteslines: line_bytes_split (n_units); break; + case type_chunk_bytes: + if (k_units == 0) + bytes_split (file_size / n_units, buf, in_blk_size, n_units); + else + bytes_chunk_extract (k_units, n_units, buf, in_blk_size, file_size); + break; + + case type_chunk_lines: + if (k_units == 0) + lines_chunk_split (n_units, buf, in_blk_size, file_size); + else + lines_chunk_extract (k_units, n_units, buf, in_blk_size, file_size); + break; + + case type_rr: + if (k_units == 0) + lines_rr (n_units, buf, in_blk_size); + else + lines_rr_extract (k_units, n_units, buf, in_blk_size); + break; + default: abort (); } diff --git a/tests/Makefile.am b/tests/Makefile.am index 85503cc..c65f9dd 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -228,8 +228,11 @@ TESTS = \ misc/sort-rand \ misc/sort-version \ misc/split-a \ + misc/split-bchunk \ misc/split-fail \ misc/split-l \ + misc/split-lchunk \ + misc/split-rchunk \ misc/stat-fmt \ misc/stat-hyphen \ misc/stat-printf \ diff --git a/tests/misc/split-bchunk b/tests/misc/split-bchunk new file mode 100755 index 0000000..17d1f7e --- /dev/null +++ b/tests/misc/split-bchunk @@ -0,0 +1,46 @@ +#!/bin/sh +# show that splitting into 3 byte delineated chunks works. + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + split --version +fi +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split -n 3 in > out || fail=1 +split -n 1/3 in > b1 || fail=1 +split -n 2/3 in > b2 || fail=1 +split -n 3/3 in > b3 || fail=1 +echo -n -e 1'\n'2 > exp-1 +echo -e '\n'3 > exp-2 +echo -e 4'\n'5 > exp-3 + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare b1 exp-1 || fail=1 +compare b2 exp-2 || fail=1 +compare b3 exp-3 || fail=1 +test -f xad && fail=1 + +# Splitting into more chunks than file size should fail. +split -n20 in 2> /dev/null && fail=1 + +Exit $fail diff --git a/tests/misc/split-fail b/tests/misc/split-fail index e36c86d..981673b 100755 --- a/tests/misc/split-fail +++ b/tests/misc/split-fail @@ -29,8 +29,10 @@ touch in || framework_failure split -a 0 in 2> /dev/null || fail=1 split -b 0 in 2> /dev/null && fail=1 +split -b /0 in 2> /dev/null && fail=1 split -C 0 in 2> /dev/null && fail=1 split -l 0 in 2> /dev/null && fail=1 +split -l /0 in 2> /dev/null && fail=1 # Make sure -C doesn't create empty files. rm -f x?? || fail=1 @@ -64,5 +66,4 @@ split: line count option -99*... is too large EOF compare out exp || fail=1 - Exit $fail diff --git a/tests/misc/split-l b/tests/misc/split-l index fb07a27..850d5b5 100755 --- a/tests/misc/split-l +++ b/tests/misc/split-l @@ -18,7 +18,7 @@ if test "$VERBOSE" = yes; then set -x - ln --version + split --version fi . $srcdir/test-lib.sh diff --git a/tests/misc/split-lchunk b/tests/misc/split-lchunk new file mode 100755 index 0000000..f672d3b --- /dev/null +++ b/tests/misc/split-lchunk @@ -0,0 +1,56 @@ +#!/bin/sh +# show that splitting into 3 newline delineated chunks works. + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + ln --version +fi + +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split -n l/3 in > out || fail=1 +split -n l/1/3 in > l1 || fail=1 +split -n l/2/3 in > l2 || fail=1 +split -n l/3/3 in > l3 || fail=1 + +cat <<\EOF > exp-1 +1 +2 +EOF +cat <<\EOF > exp-2 +3 +EOF +cat <<\EOF > exp-3 +4 +5 +EOF + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare l1 exp-1 || fail=1 +compare l2 exp-2 || fail=1 +compare l3 exp-3 || fail=1 +test -f xad && fail=1 + +# Splitting into more chunks than file size should fail. +split -n l/20 in 2> /dev/null && fail=1 + +Exit $fail diff --git a/tests/misc/split-rchunk b/tests/misc/split-rchunk new file mode 100755 index 0000000..98e2f36 --- /dev/null +++ b/tests/misc/split-rchunk @@ -0,0 +1,53 @@ +#!/bin/sh +# show that splitting into 3 round-robin chunks works. + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +if test "$VERBOSE" = yes; then + set -x + ln --version +fi + +. $srcdir/test-lib.sh + +printf '1\n2\n3\n4\n5\n' > in || framework_failure + +split -n r/3 in > out || fail=1 +split -n r/1/3 in > r1 || fail=1 +split -n r/2/3 in > r2 || fail=1 +split -n r/3/3 in > r3 || fail=1 + +cat <<\EOF > exp-1 +1 +4 +EOF +cat <<\EOF > exp-2 +2 +5 +EOF +cat <<\EOF > exp-3 +3 +EOF + +compare xaa exp-1 || fail=1 +compare xab exp-2 || fail=1 +compare xac exp-3 || fail=1 +compare r1 exp-1 || fail=1 +compare r2 exp-2 || fail=1 +compare r3 exp-3 || fail=1 +test -f xad && fail=1 + +Exit $fail -- 1.6.2.5