Hello,
On 2018-10-12 10:28 a.m., 積丹尼 Dan Jacobson wrote:
OK, but you need to mention some examples of why someone would want to
"sort" something "randomly".
>
Attached is a patch to add examples of shuf/sort -R
to the coreutils documentation.
(It doesn't deal with "why", that is left to the users to decide when
they need it, but it shows clear examples of how to use it).
regards,
- assaf
>From a8ae1f29a96b47b9a9c2b26875bd41bfa124e83b Mon Sep 17 00:00:00 2001
From: Assaf Gordon <[email protected]>
Date: Sun, 30 Dec 2018 12:21:31 -0700
Subject: [PATCH] doc: add examples of shuf/sort -R
Requested by Dan Jacobson <[email protected]> in
https://bugs.gnu.org/33025 .
* doc/coreutils.texi (randomizing files): New section.
---
doc/coreutils.texi | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 148 insertions(+)
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 8d303cd56..e05b34ab1 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -276,6 +276,7 @@ Operating on sorted files
* comm invocation:: Compare two sorted files line by line
* ptx invocation:: Produce a permuted index of file contents
* tsort invocation:: Topological sort
+* randomizing files:: Producing random output
@command{ptx}: Produce permuted indexes
@@ -4192,6 +4193,7 @@ These commands work with (or produce) sorted files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
* tsort invocation:: Topological sort.
+* randomizing files:: Producing random output
@end menu
@@ -6018,6 +6020,152 @@ Anyhow, that's where tsort came from. To solve an old problem with
the way the linker handled archive files, which has since been solved
in different ways.
+@node randomizing files
+@section Producing random output
+
+The @command{shuf} and @command{sort -R/--random-sort} commands read input
+(sorted or not) and output its lines in a randomized order.
+@command{shuf} shuffles all input lines equally, regardless of their content.
+@command{sort -R} shuffles the @emph{keys} of the input lines -
+lines with identical sort keys will be grouped together:
+
+@multitable @columnfractions .5 .5
+@item
+@example
+$ printf '%s\n' A A A B B C D D | shuf
+A
+C
+D
+D
+A
+B
+A
+B
+@end example
+@tab
+@example
+$ printf '%s\n' A A A B B C D D | sort -R
+C
+D
+D
+A
+A
+A
+B
+B
+@end example
+@end multitable
+
+@command{shuf -n @var{count}} outputs at most @var{count} number of lines (i.e.,
+a sub-sample). @command{sort --random-sort --uniq} outputs one line of each
+group in a random order:
+
+@multitable @columnfractions .5 .5
+@item
+@example
+$ printf '%s\n' A A A B B C D D | shuf -n5
+B
+D
+A
+D
+B
+@end example
+@tab
+@example
+$ printf '%s\n' A A A B B C D D | sort -R -u
+C
+A
+B
+D
+@end example
+@end multitable
+
+@command{sort} operates on keys. Random and non-random keys can be combined
+to achieve desired results. In the following examples, the input file @file{in}
+contains these lines:
+
+@example
+$ cat in
+A 5
+A 3
+A 7
+B 6
+B 4
+C 4
+D 9
+D 8
+@end example
+
+@command{sort -R} without explicit keys operates on entire lines,
+producing unexpected results (as @samp{A 5} and @samp{A 3} do not result
+in identical key value):
+
+@example
+$ sort -R in
+A 7
+C 4
+A 3
+D 8
+B 6
+B 4
+A 5
+D 9
+@end example
+
+Specifing explicit key to sort randomly results in the keyed
+colomn (letters) in random order (yet same keys groupped together),
+and the other column (digits) sorted alphabetically (the default
+last-resort sort):
+
+@example
+$ sort -k1,1R in
+C 4
+A 3
+A 5
+A 7
+B 4
+B 6
+D 8
+D 9
+@end example
+
+
+In the following example, the first columns (letters) are sorted in
+reverse alphabetical order, and the second column (digits) are sorted
+randomly:
+
+@example
+$ sort -k1,1r -k2,2R in
+D 8
+D 9
+C 4
+B 6
+B 4
+A 7
+A 3
+A 5
+@end example
+
+
+To randomize a single column and keep the input order of all other
+columns, use the @option{-s/--stable} option. In the following example
+the letters will be groupped in random order, while the digits will
+be in the same order as the input file (i.e., the digits in group @samp{A}
+will always be @samp{5},@samp{3},@samp{7} - exactly as in the input file):
+
+@example
+$ sort -k1,1R -s in
+D 9
+D 8
+B 6
+B 4
+A 5
+A 3
+A 7
+C 4
+@end example
+
+
@node Operating on fields
@chapter Operating on fields
--
2.11.0