Hello,
This patch enhances numfmt to support the same field range specifiers
that cut supports.
That is:
--field N (a single field)
--field N-M (a field range N through M)
--field N- (field N and all the fields after it)
--field -M (the first field up to and including M)
Multiple range specifiers can be combined with commas:
--field N1,M1,N2-M2,N3 ... etc
I've also added support for '*' to indicate all fields, as well as an
explicit --all-fields option:
--field \* OR --all-fields
So instead of doing this:
df | numfmt --header --field 2 --to=si \
| numfmt --header --field 3 --to=si \
| numfmt --header --field 4 --to=si
You can now do this:
df | numfmt --header --field 2-4 --to=si
There was a TODO about changing the default to processing all fields,
and I originally made this change. After thinking about it I wasn't
sure if this was the right thing to do, so I've left the default as
field 1 for now (and added the all-field option/specifier instead).
I've added a complete set of unit tests for the new functionality, and
I've done my best to follow the guidelines in HACKING, so please let
me know if I missed anything.
In addition to the attached patch you can also fetch the associated
feature branch from:
git fetch [email protected]:calid/coreutils.git
numfmt-field-ranges:numfmt-field-ranges
Please note that there are a couple new gnulib modules added in
bootstrap.conf, so if building in the git repo you will need to run
./bootstrap again. Annoyingly the new gnulib files generate warnings
that cause the default make to fail, so I had to build with make
CFLAGS=-Wno-error. I guess this is a bug in gnulib?
Thanks!
From c249fab996710a501c68d2913b3805ad4ff5acf7 Mon Sep 17 00:00:00 2001
From: Dylan Cali <[email protected]>
Date: Fri, 5 Sep 2014 04:42:02 -0500
Subject: [PATCH 1/4] tests: add field range tests
* tests/misc/numfmt.pl: Add tests for 'cut style' field range
specifiers:
N single field
N-M from N to M inclusive
N- from N to last field inclusive
-M from first field to M inclusive
* all fields
Also add tests for multiple field ranges separated by commas, and
indicating all fields via an explicit --all-fields option instead of
'*'.
---
tests/misc/numfmt.pl | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index dfb4b2e..e0a819c 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -222,6 +222,34 @@ my @Tests =
{ERR=>"$prog: input line is too short, no numbers found " .
"to convert in field 3\n"}],
+ # Multiple fields
+ ['field-range-1', '--field 2,4 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1000 2.0K 3000 4.0K 5000"}],
+
+ ['field-range-2', '--field 2-4 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1000 2.0K 3.0K 4.0K 5000"}],
+
+ ['field-range-3', '--field 1,2,3-5 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4.0K 5.0K"}],
+
+ ['field-range-4', '--field 1-5 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4.0K 5.0K"}],
+
+ ['field-range-5', '--field 1-3,5 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4000 5.0K"}],
+
+ ['field-range-6', '--field 3- --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1000 2000 3.0K 4.0K 5.0K"}],
+
+ ['field-range-7', '--field -3 --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4000 5000"}],
+
+ ['all-fields-1', '--all-fields --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4.0K 5.0K"}],
+
+ ['all-fields-2', '--field \'*\' --to=si "1000 2000 3000 4000 5000"',
+ {OUT=>"1.0K 2.0K 3.0K 4.0K 5.0K"}],
+
# Auto-consume white-space, setup auto-padding
['whitespace-1', '--to=si --field 2 "A 500 B"', {OUT=>"A 500 B"}],
['whitespace-2', '--to=si --field 2 "A 5000 B"', {OUT=>"A 5.0K B"}],
--
2.1.0
From d3487c028b366d135e674b28b0101fdee37e2f9d Mon Sep 17 00:00:00 2001
From: Dylan Cali <[email protected]>
Date: Fri, 5 Sep 2014 04:41:31 -0500
Subject: [PATCH 2/4] numfmt: implement support for field ranges
* src/numfmt.c: Replace field handling code with logic that understands
field range specifiers. Instead of processing a single field and
printing line prefix/suffix around it, process each field in the line
checking whether it has been included for conversion. If so convert and
print, otherwise just print the unaltered field.
(extract_fields): Removed.
(skip_fields): Removed.
(process_line): Gutted and heavily reworked.
(process_suffixed_number): FIELD is now passed as an arg instead of
using a global.
(parse_field_arg): New function that parses field range specifiers.
(next_field): New function that returns pointers to the next field in
a line.
(process_field): New function that wraps the field conversion logic
(include_field): New function that checks whether a field should be
converted
(compare_field): New function used for field value comparisons in a
gl_list.
(free_field): New function used for freeing field values in a gl_list.
Global variable FIELD removed.
New global variable all_fields indicates whether all fields should be
processed.
New global variable all_fields_after stores the first field of a N-
style range.
New global variable all_fields_before stores the last field of a -M
style range.
New global variable field_list stores explicitly specified fields to
process (N N,M or N-M style specifiers).
New option --all-fields
* bootstrap.conf: Include xlist and avltree-list modules. numfmt now
uses the gl_avltree_list implementation to store the field list.
---
bootstrap.conf | 2 +
src/numfmt.c | 349 ++++++++++++++++++++++++++++++++++++++++-----------------
2 files changed, 246 insertions(+), 105 deletions(-)
diff --git a/bootstrap.conf b/bootstrap.conf
index c0b5f02..a362376 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -34,6 +34,7 @@ gnulib_modules="
argv-iter
assert
autobuild
+ avltree-list
backupfile
base64
buffer-lcm
@@ -268,6 +269,7 @@ gnulib_modules="
xgetcwd
xgetgroups
xgethostname
+ xlist
xmemcoll
xnanosleep
xprintf
diff --git a/src/numfmt.c b/src/numfmt.c
index b524e65..beecc74 100644
--- a/src/numfmt.c
+++ b/src/numfmt.c
@@ -29,6 +29,8 @@
#include "system.h"
#include "xstrtol.h"
#include "xstrndup.h"
+#include "gl_avltree_list.h"
+#include "gl_xlist.h"
/* The official name of this program (e.g., no 'g' prefix). */
#define PROGRAM_NAME "numfmt"
@@ -48,6 +50,7 @@ enum
SUFFIX_OPTION,
GROUPING_OPTION,
PADDING_OPTION,
+ ALL_FIELDS_OPTION,
FIELD_OPTION,
DEBUG_OPTION,
DEV_DEBUG_OPTION,
@@ -135,6 +138,7 @@ static struct option const longopts[] =
{"suffix", required_argument, NULL, SUFFIX_OPTION},
{"grouping", no_argument, NULL, GROUPING_OPTION},
{"delimiter", required_argument, NULL, 'd'},
+ {"all-fields", no_argument, NULL, ALL_FIELDS_OPTION},
{"field", required_argument, NULL, FIELD_OPTION},
{"debug", no_argument, NULL, DEBUG_OPTION},
{"-debug", no_argument, NULL, DEV_DEBUG_OPTION},
@@ -182,7 +186,10 @@ static int conv_exit_code = EXIT_CONVERSION_WARNINGS;
/* auto-pad each line based on skipped whitespace. */
static int auto_padding = 0;
static mbs_align_t padding_alignment = MBS_ALIGN_RIGHT;
-static long int field = 1;
+static bool all_fields = false;
+static long int all_fields_after = 0;
+static long int all_fields_before = 0;
+static gl_list_t field_list;
static int delimiter = DELIMITER_DEFAULT;
/* if non-zero, the first 'header' lines from STDIN are skipped. */
@@ -1153,7 +1160,7 @@ print_padded_number (void)
/* Converts the TEXT number string to the requested representation,
and handles automatic suffix addition. */
static int
-process_suffixed_number (char *text, long double *result, size_t *precision)
+process_suffixed_number (char *text, long double *result, size_t *precision, long int field)
{
if (suffix && strlen (text) > strlen (suffix))
{
@@ -1204,142 +1211,270 @@ process_suffixed_number (char *text, long double *result, size_t *precision)
return (e == SSE_OK || e == SSE_OK_PRECISION_LOSS);
}
-/* Skip the requested number of fields in the input string.
- Returns a pointer to the *delimiter* of the requested field,
- or a pointer to NUL (if reached the end of the string). */
-static inline char * _GL_ATTRIBUTE_PURE
-skip_fields (char *buf, int fields)
+static int
+compare_field(const void *elt1, const void *elt2)
{
- char *ptr = buf;
- if (delimiter != DELIMITER_DEFAULT)
+ long int i1 = *(long int*)elt1;
+ long int i2 = *(long int*)elt2;
+
+ if (i1 < i2) {
+ return -1;
+ }
+
+ if (i1 == i2) {
+ return 0;
+ }
+
+ return 1;
+}
+
+static void
+free_field(const void *elt) {
+ long int *i = (long int*)elt;
+ free(i);
+}
+
+/* Add the specified fields to field_list.
+ The format recognized is the same as for cut */
+static void
+parse_field_arg(char *optarg)
+{
+
+ char *start, *end;
+ int *n, field_val;
+ int range_val = 0;
+
+ start = end = optarg;
+
+ if (*start == '*')
{
- if (*ptr == delimiter)
- fields--;
- while (*ptr && fields--)
- {
- while (*ptr && *ptr == delimiter)
- ++ptr;
- while (*ptr && *ptr != delimiter)
- ++ptr;
- }
+ all_fields = true;
+
+ return;
}
- else
- while (*ptr && fields--)
+
+ if (*start == '-')
+ {
+ /* range -M */
+ ++start;
+
+ all_fields_before = strtol (start, &end, 10);
+
+ if (start == end || all_fields_before <=0)
+ error (EXIT_FAILURE, 0, _("invalid field value %s"),
+ quote (start));
+
+ return;
+ }
+
+ field_list = gl_list_create_empty (GL_AVLTREE_LIST,
+ NULL, NULL, free_field, false);
+
+ while (*end != '\0') {
+ field_val = strtol (start, &end, 10);
+
+ if (start == end || field_val <=0)
+ error (EXIT_FAILURE, 0, _("invalid field value %s"),
+ quote (start));
+
+ if (!range_val)
{
- while (*ptr && isblank (to_uchar (*ptr)))
- ++ptr;
- while (*ptr && !isblank (to_uchar (*ptr)))
- ++ptr;
+ /* field N */
+ n = malloc (sizeof(int));
+ *n = field_val;
+ gl_sortedlist_add (field_list, compare_field, n);
}
- return ptr;
-}
+ else
+ {
+ /* range N-M
+ The last field was the start of the field range. The current
+ field is the end of the field range. We already added the
+ start field, so increment and add all the fields through
+ range end. */
+ for (++range_val; range_val <= field_val; ++range_val) {
+ n = malloc (sizeof(int));
+ *n = range_val;
+ gl_sortedlist_add (field_list, compare_field, n);
+ }
-/* Parse a delimited string, and extracts the requested field.
- NOTE: the input buffer is modified.
+ range_val = 0;
+ }
- TODO:
- Maybe support multiple fields, though can always pipe output
- into another numfmt to process other fields.
- Maybe default to processing all fields rather than just first?
+ switch (*end) {
+ case ',':
+ /* discrete field separator */
+ ++end;
+ start = end;
+ break;
- Output:
- _PREFIX, _DATA, _SUFFIX will point to the relevant positions
- in the input string, or be NULL if such a part doesn't exist. */
-static void
-extract_fields (char *line, int _field,
- char ** _prefix, char ** _data, char ** _suffix)
-{
- char *ptr = line;
- *_prefix = NULL;
- *_data = NULL;
- *_suffix = NULL;
+ case '-':
+ /* field range separator */
+ ++end;
+ start = end;
+ range_val = field_val;
+ break;
+ }
+ }
- devmsg ("extracting Fields:\n input: %s\n field: %d\n",
- quote (line), _field);
+ if (range_val)
+ {
+ /* range N-
+ range_val was not reset indicating optarg
+ ended with a trailing '-' */
+ all_fields_after = range_val;
+ }
+}
- if (field > 1)
+/* Return a pointer to the beginning of the next field in line.
+ The line pointer is moved to the end of the next field. */
+static char*
+next_field (char **line)
+{
+ char *field_start = *line;
+ char *field_end = field_start;
+
+ if (delimiter != DELIMITER_DEFAULT)
{
- /* skip the requested number of fields. */
- *_prefix = line;
- ptr = skip_fields (line, field - 1);
- if (*ptr == '\0')
+ if (*field_start != delimiter)
{
- /* not enough fields in the input - print warning? */
- devmsg (" TOO FEW FIELDS!\n prefix: %s\n", quote (*_prefix));
- return;
+ while (*field_end && *field_end != delimiter)
+ ++field_end;
}
+ /* else empty field */
+ }
+ else
+ {
+ /* keep any space prefix in the returned field */
+ while (*field_end && isblank (to_uchar (*field_end)))
+ ++field_end;
- *ptr = '\0';
- ++ptr;
+ while (*field_end && !isblank (to_uchar (*field_end)))
+ ++field_end;
}
- *_data = ptr;
- *_suffix = skip_fields (*_data, 1);
- if (**_suffix)
+ *line = field_end;
+ return field_start;
+}
+
+static bool
+include_field (long int field)
+{
+ int *i;
+
+ if (all_fields)
+ return true;
+
+ if (all_fields_after && all_fields_after <= field)
+ return true;
+
+ if (all_fields_before && field <= all_fields_before)
+ return true;
+
+ if (!field_list)
{
- /* there is a suffix (i.e. the field is not the last on the line),
- so null-terminate the _data before it. */
- **_suffix = '\0';
- ++(*_suffix);
+ /* default to field 1 */
+ field_list =
+ gl_list_create_empty (GL_AVLTREE_LIST,
+ NULL, NULL, free_field, false);
+
+ i = malloc (sizeof(int));
+ *i = 1;
+ gl_sortedlist_add (field_list, compare_field, i);
}
- else
- *_suffix = NULL;
- devmsg (" prefix: %s\n number: %s\n suffix: %s\n",
- quote_n (0, *_prefix ? *_prefix : ""),
- quote_n (1, *_data),
- quote_n (2, *_suffix ? *_suffix : ""));
+ return gl_sortedlist_search (field_list, compare_field, &field);
}
-
-/* Convert a number in a given line of text.
- NEWLINE specifies whether to output a '\n' for this "line". */
-static int
-process_line (char *line, bool newline)
+/* Convert and output the given field. If it is not included in the set
+ of fields to process just output the original */
+static bool
+process_field (char *text, long int field)
{
- char *pre, *num, *suf;
long double val = 0;
size_t precision = 0;
- int valid_number = 0;
-
- extract_fields (line, field, &pre, &num, &suf);
- if (!num)
- if (_invalid != inval_ignore)
- error (conv_exit_code, 0, _("input line is too short, "
- "no numbers found to convert in field %ld"),
- field);
+ bool valid_number = true;
- if (num)
+ if (include_field (field))
{
- valid_number = process_suffixed_number (num, &val, &precision);
+ valid_number =
+ process_suffixed_number (text, &val, &precision, field);
+
if (valid_number)
valid_number = prepare_padded_number (val, precision);
+
+ if (valid_number)
+ print_padded_number ();
+ else
+ fputs (text, stdout);
}
+ else
+ fputs (text, stdout);
- if (pre)
- fputs (pre, stdout);
+ return valid_number;
+}
- if (pre && num)
- fputc ((delimiter == DELIMITER_DEFAULT) ? ' ' : delimiter, stdout);
+/* Convert number in a given line of text.
+ NEWLINE specifies whether to output a '\n' for this "line". */
+static int
+process_line (char *line, bool newline)
+{
+ char *next;
+ long int field = 0;
+ long int last_field;
+ bool valid_number = true;
- if (valid_number)
- {
- print_padded_number ();
- }
- else
+ while (true) {
+ ++field;
+ next = next_field (&line);
+
+ if (*line != '\0')
+ {
+ /* nul terminate the current field string and process */
+ *line = '\0';
+
+ if (!process_field (next, field))
+ valid_number = false;
+
+ fputc ((delimiter == DELIMITER_DEFAULT) ?
+ ' ' : delimiter, stdout);
+ ++line;
+ }
+ else
+ {
+ /* end of the line, process the last field and finish */
+ if (!process_field (next, field))
+ valid_number = false;
+
+ break;
+ }
+ }
+
+ if (newline)
+ putchar ('\n');
+
+ if (!field_list)
{
- if (num)
- fputs (num, stdout);
+ /* no field_list set which indicates one of the all_fields ranges
+ specified, so just return */
+ return valid_number;
}
- if (suf)
+ last_field = *(long int*)gl_list_get_at(field_list,
+ gl_list_size(field_list)-1);
+
+ if (field < last_field)
{
- fputc ((delimiter == DELIMITER_DEFAULT) ? ' ' : delimiter, stdout);
- fputs (suf, stdout);
- }
+ /* there were unprocessed fields */
+ if (_invalid != inval_ignore)
+ {
+ error (conv_exit_code, 0,
+ _("input line is too short, "
+ "no numbers found to convert in field %ld"),
+ last_field);
- if (newline)
- putchar ('\n');
+ return false;
+ }
+ }
return valid_number;
}
@@ -1411,11 +1546,12 @@ main (int argc, char **argv)
to --header lines too. */
break;
+ case ALL_FIELDS_OPTION:
+ all_fields = true;
+ break;
+
case FIELD_OPTION:
- if (xstrtol (optarg, NULL, 10, &field, "") != LONGINT_OK
- || field <= 0)
- error (EXIT_FAILURE, 0, _("invalid field value %s"),
- quote (optarg));
+ parse_field_arg(optarg);
break;
case 'd':
@@ -1530,6 +1666,9 @@ main (int argc, char **argv)
free (format_str_prefix);
free (format_str_suffix);
+ if (field_list)
+ gl_list_free (field_list);
+
if (debug && !valid_numbers)
error (0, 0, _("failed to convert some of the input numbers"));
--
2.1.0
From 5444af9d3b021ed186eb441170bce6aac0a5d42b Mon Sep 17 00:00:00 2001
From: Dylan Cali <[email protected]>
Date: Fri, 5 Sep 2014 00:56:46 -0500
Subject: [PATCH 3/4] tests: partial output can occur before an error
* tests/misc/numfmt.pl: Each field is now processed and printed
in turn, so there may be some output by the time a field is reached that
causes an error. Update the tests to reflect this. Also remove test
for the 'invalid' field -5.. this is now a valid range specifier.
---
tests/misc/numfmt.pl | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index e0a819c..299782b 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -190,6 +190,7 @@ my @Tests =
['delim-5', '-d: --field=2 --from=auto :40M:60M', {OUT=>':40000000:60M'}],
['delim-6', '--delimiter=: --field 3 --from=auto 40M:60M',
{EXIT=>2},
+ {OUT=>"40M:60M\n"},
{ERR=>"$prog: input line is too short, no numbers found " .
"to convert in field 3\n"}],
@@ -197,12 +198,10 @@ my @Tests =
['field-1', '--field A',
{ERR => "$prog: invalid field value 'A'\n"},
{EXIT => '1'}],
- ['field-1.1', '--field -5',
- {ERR => "$prog: invalid field value '-5'\n"},
- {EXIT => '1'}],
['field-2', '--field 2 --from=auto "Hello 40M World 90G"',
{OUT=>'Hello 40000000 World 90G'}],
['field-3', '--field 3 --from=auto "Hello 40M World 90G"',
+ {OUT=>"Hello 40M "},
{ERR=>"$prog: invalid number: 'World'\n"},
{EXIT => 2},],
# Last field - no text after number
@@ -219,6 +218,7 @@ my @Tests =
# not enough fields
['field-8', '--field 3 --to=si "Hello World"',
{EXIT=>2},
+ {OUT=>"Hello World\n"},
{ERR=>"$prog: input line is too short, no numbers found " .
"to convert in field 3\n"}],
@@ -701,7 +701,7 @@ my @Tests =
['devdebug-11', '---debug --format "%\'-10f" 10000',{OUT=>"10000 "},
{ERR=>""},
{ERR_SUBST=>"s/.*//msg"}],
- ['devdebug-12', '---debug --field 2 A',{OUT=>""},
+ ['devdebug-12', '---debug --field 2 A',{OUT=>"A\n"},
{ERR=>""}, {EXIT=>2},
{ERR_SUBST=>"s/.*//msg"}],
--
2.1.0
From 6199de39ff4340f8d91aa5e179988398f6684d3f Mon Sep 17 00:00:00 2001
From: Dylan Cali <[email protected]>
Date: Thu, 4 Sep 2014 22:27:41 -0500
Subject: [PATCH 4/4] doc: add usage for field range specifiers
* src/numfmt.c (usage): Document newly supported field range specifiers
and --all-fields option
---
src/numfmt.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/src/numfmt.c b/src/numfmt.c
index beecc74..187adc5 100644
--- a/src/numfmt.c
+++ b/src/numfmt.c
@@ -832,7 +832,11 @@ Reformat NUMBER(s), or the numbers from standard input if none are specified.\n\
-d, --delimiter=X use X instead of whitespace for field delimiter\n\
"), stdout);
fputs (_("\
- --field=N replace the number in input field N (default is 1)\n\
+ --all-fields replace the numbers in all the input fields\n\
+"), stdout);
+ fputs (_("\
+ --field=FIELDS replace the numbers in these input fields (default is 1)\n\
+ see FIELDS below\n\
"), stdout);
fputs (_("\
--format=FORMAT use printf style floating-point FORMAT;\n\
@@ -911,6 +915,16 @@ UNIT options:\n"), stdout);
...\n"), stdout);
fputs (_("\n\
+FIELDS accepts cut style field ranges:\n\
+ N N'th field, counted from 1\n\
+ N- from N'th field, to end of line\n\
+ N-M from N'th to M'th (included) field\n\
+ -M from first to M'th (included) field\n\
+ * all fields\n\
+Multiple fields/ranges can be separated with commas\n\
+"), stdout);
+
+ fputs (_("\n\
FORMAT must be suitable for printing one floating-point argument '%f'.\n\
Optional quote (%'f) will enable --grouping (if supported by current locale).\n\
Optional width value (%10f) will pad output. Optional zero (%010f) width\n\
@@ -938,7 +952,7 @@ Examples:\n\
-> \"1000\"\n\
$ echo 1K | %s --from=iec\n\
-> \"1024\"\n\
- $ df | %s --header --field 2 --to=si\n\
+ $ df | %s --header --field 2-4 --to=si\n\
$ ls -l | %s --header --field 5 --to=iec\n\
$ ls -lh | %s --header --field 5 --from=iec --padding=10\n\
$ ls -lh | %s --header --field 5 --from=iec --format %%10f\n"),
--
2.1.0