Hello Pádraig,

Pádraig Brady wrote, On 03/24/2013 11:45 PM:
>>>>> On 03/06/2013 11:50 PM, Assaf Gordon wrote:
>>>>>> Attached is a suggestion to implement reservoir-sampling in shuf:
>>>>>> When the expected output of lines is known, it will not load the entire 
>>>>>> file into memory - allowing shuffling very large inputs.
> 
> I've attached 9 patches to adjust things a bit.
> 

Looks great, thank you very much.

One minor improvement: the comment in the test file is wrong (in early stages 
of the patch I thought I could use a fixed random-source and pre-calculate the 
expected output).
Attached is a fix.

-gordon
>From d01dd496c517e20ac92fcbbb6b34045303b1b514 Mon Sep 17 00:00:00 2001
From: Assaf Gordon <[email protected]>
Date: Mon, 25 Mar 2013 12:25:50 -0400
Subject: [PATCH] maint: adjust shuf resevoir sampling comments

* tests/misc/shuf-reservoir.sh: re-word comments.
---
 tests/misc/shuf-reservoir.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/misc/shuf-reservoir.sh b/tests/misc/shuf-reservoir.sh
index b695afc..6ba6e6e 100755
--- a/tests/misc/shuf-reservoir.sh
+++ b/tests/misc/shuf-reservoir.sh
@@ -26,7 +26,7 @@ require_valgrind_
 getlimits_
 
 # Run "shuf" with specific number of input lines and output lines
-# The output must match the expected (pre-calculated) output.
+# Check the output for expected number of lines.
 run_shuf_n()
 {
   INPUT_LINES="$1"
-- 
1.7.7.4

Reply via email to