Hello Pádraig, Pádraig Brady wrote, On 03/24/2013 11:45 PM: >>>>> On 03/06/2013 11:50 PM, Assaf Gordon wrote: >>>>>> Attached is a suggestion to implement reservoir-sampling in shuf: >>>>>> When the expected output of lines is known, it will not load the entire >>>>>> file into memory - allowing shuffling very large inputs. > > I've attached 9 patches to adjust things a bit. >
Looks great, thank you very much. One minor improvement: the comment in the test file is wrong (in early stages of the patch I thought I could use a fixed random-source and pre-calculate the expected output). Attached is a fix. -gordon
>From d01dd496c517e20ac92fcbbb6b34045303b1b514 Mon Sep 17 00:00:00 2001 From: Assaf Gordon <[email protected]> Date: Mon, 25 Mar 2013 12:25:50 -0400 Subject: [PATCH] maint: adjust shuf resevoir sampling comments * tests/misc/shuf-reservoir.sh: re-word comments. --- tests/misc/shuf-reservoir.sh | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tests/misc/shuf-reservoir.sh b/tests/misc/shuf-reservoir.sh index b695afc..6ba6e6e 100755 --- a/tests/misc/shuf-reservoir.sh +++ b/tests/misc/shuf-reservoir.sh @@ -26,7 +26,7 @@ require_valgrind_ getlimits_ # Run "shuf" with specific number of input lines and output lines -# The output must match the expected (pre-calculated) output. +# Check the output for expected number of lines. run_shuf_n() { INPUT_LINES="$1" -- 1.7.7.4
