On 07/10/10 19:25, Pádraig Brady wrote:
> On 07/10/10 18:43, Assaf Gordon wrote:
>> Pádraig Brady wrote, On 10/07/2010 06:22 AM:
>>> On 07/10/10 01:03, Pádraig Brady wrote:
>>>> On 06/10/10 21:41, Assaf Gordon wrote:
>>>>>
>>>>> The "--auto-format" feature simply builds the "-o" format line
>>>>> automatically, based on the number of columns from both input files.
>>>>
>>>> Thanks for persisting with this and presenting a concise example.
>>>> I agree that this is useful and can't think of a simple workaround.
>>>> Perhaps the interface would be better as:
>>>>
>>>> -o {all (default), padded, FORMAT}
>>>>
>>>> where padded is the functionality you're suggesting?
>>>
>>> Thinking more about it, we mightn't need any new options at all.
>>> Currently -e is redundant if -o is not specified.
>>> So how about changing that so that if -e is specified
>>> we operate as above by auto inserting empty fields?
>>> Also I wouldn't base on the number of fields in the first line,
>>> instead auto padding to the biggest number of fields
>>> on the current lines under consideration.
>>
>> My concern is the principle of "least surprise" - if there are existing
>> scripts/programs that specify "-e" without "-o" (doesn't make sense, but
>> still possible) - this change will alter their behavior.
>>
>> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might be
>> a bit confusing.
>
> Well seeing as -e without -o currently does nothing,
> I don't think we need to worry too much about changing that behavior.
> Also to me, specifying -e EMPTY implicitly means I want
> fields missing from one of the files replaced with EMPTY.
>
> Note POSIX is more explicit, and describes our current operation:
>
> -e EMPTY
> Replace empty output fields in the list selected by -o with EMPTY
>
> So changing that would be an extension to POSIX.
> But I still think it makes sense.
> I'll prepare a patch soon, to do as I describe above,
> unless there are objections.
The attached changes `join` (from what's done on other platforms) so that...
`join -e` will automatically pad missing fields from one file
so that the same number of fields are output from each file.
Previously -e was only used for missing fields specified with -o or -j.
With this change join now does:
$ cat file1
a 1 2
b 1
d 1 2
$ cat file2
a 3 4
b 3 4
c 3 4
$ join -a1 -a2 -1 1 -2 1 -e. file1 file2
a 1 2 3 4
b 1 . 3 4
c . . 3 4
d 1 2 . .
$ join -a1 -a2 -1 1 -2 4 -e. file1 file2
. . . . a 3 4
. . . . b 3 4
. . . . c 3 4
a 1 2 . .
b 1 .
d 1 2 . .
$ join -a1 -a2 -1 4 -2 1 -e. file1 file2
. a 1 2 . . .
. b 1 . .
. d 1 2 . . .
a . . 3 4
b . . 3 4
c . . 3 4
$ join -a1 -a2 -1 4 -2 4 -e. file1 file2
. a 1 2 a 3 4
. a 1 2 b 3 4
. a 1 2 c 3 4
. b 1 . a 3 4
. b 1 . b 3 4
. b 1 . c 3 4
. d 1 2 a 3 4
. d 1 2 b 3 4
. d 1 2 c 3 4
While -e without -o was previously a noop, and so could safely be extended IMHO,
this will also change the behavior when with -e and -j are specified.
Previously if -j > 1 was specified, and that field was missing,
then -e would be used in its place, rather than the empty string.
This still does that, but also does the padding.
Without the -j issue I'd be 80:20 for just extending -e to auto pad,
but given -j I'm 50:50. The alternative it to select this with
say '-o padded', but that's less discoverable, and complicates
the interface somewhat.
cheers,
Pádraig.
>From 3ea251d770fd79c37ee5195480574ccd6bfa156c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Wed, 5 Jan 2011 11:52:54 +0000
Subject: [PATCH] join: auto pad missing fields when -e specified
* src/join.c (prfields): A new function refactored from prjoin(),
to output all but the join field. This function also applies
the extra padding when appropriate.
(prjoin): Don't swap line1 and line2 when line1 is blank
so that the padding is applied to the right place.
* tests/misc/join: Remove existing uses of -e without -o,
and add 2 new cases to test the new auto padding behavior.
* NEWS: Mention the change in behavior.
Suggestion from Assaf Gordon
---
NEWS | 6 ++++
src/join.c | 81 ++++++++++++++++++++++++++++++++++++-------------------
tests/misc/join | 16 ++++++++---
3 files changed, 71 insertions(+), 32 deletions(-)
diff --git a/NEWS b/NEWS
index 2a71ca6..f02e8a8 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU coreutils NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Changes in behavior
+
+ join -e will automatically pad missing fields from one file
+ so that the same number of fields are output from each file.
+ Previously -e was only used for missing fields specified with -o or -j.
+
* Noteworthy changes in release 8.9 (2011-01-04) [stable]
diff --git a/src/join.c b/src/join.c
index afda5a1..1df24a9 100644
--- a/src/join.c
+++ b/src/join.c
@@ -527,6 +527,45 @@ prfield (size_t n, struct line const *line)
fputs (empty_filler, stdout);
}
+/* Output all the fields in line, other than the join field. */
+
+static void
+prfields (struct line const *line, size_t join_field,
+ struct line const *oline, size_t ojoin_field)
+{
+ size_t i;
+ char output_separator = tab < 0 ? ' ' : tab;
+
+ for (i = 0; i < join_field && i < line->nfields; ++i)
+ {
+ putchar (output_separator);
+ prfield (i, line);
+ }
+ for (i = join_field + 1; i < line->nfields; ++i)
+ {
+ putchar (output_separator);
+ prfield (i, line);
+ }
+ if (empty_filler && oline->nfields > line->nfields)
+ {
+ bool extra_filler = false;
+
+ /* Increase padding by 1 to account for the implicit
+ join field outputted for a blank line. */
+ if (line != &uni_blank)
+ extra_filler = true;
+ /* Or if we didn't print a join field, increase padding. */
+ else if (oline->nfields <= ojoin_field || !oline->fields[ojoin_field].len)
+ extra_filler = true;
+
+ for (i = !extra_filler; i < oline->nfields - line->nfields; ++i)
+ {
+ putchar (output_separator);
+ fputs (empty_filler, stdout);
+ }
+ }
+}
+
/* Print the join of LINE1 and LINE2. */
static void
@@ -534,6 +573,8 @@ prjoin (struct line const *line1, struct line const *line2)
{
const struct outlist *outlist;
char output_separator = tab < 0 ? ' ' : tab;
+ size_t field;
+ struct line const *line;
outlist = outlist_head.next;
if (outlist)
@@ -543,9 +584,6 @@ prjoin (struct line const *line1, struct line const *line2)
o = outlist;
while (1)
{
- size_t field;
- struct line const *line;
-
if (o->file == 0)
{
if (line1 == &uni_blank)
@@ -574,37 +612,24 @@ prjoin (struct line const *line1, struct line const *line2)
}
else
{
- size_t i;
-
if (line1 == &uni_blank)
{
- struct line const *t;
- t = line1;
- line1 = line2;
- line2 = t;
+ line = line2;
+ field = join_field_2;
}
- prfield (join_field_1, line1);
- for (i = 0; i < join_field_1 && i < line1->nfields; ++i)
- {
- putchar (output_separator);
- prfield (i, line1);
- }
- for (i = join_field_1 + 1; i < line1->nfields; ++i)
+ else
{
- putchar (output_separator);
- prfield (i, line1);
+ line = line1;
+ field = join_field_1;
}
- for (i = 0; i < join_field_2 && i < line2->nfields; ++i)
- {
- putchar (output_separator);
- prfield (i, line2);
- }
- for (i = join_field_2 + 1; i < line2->nfields; ++i)
- {
- putchar (output_separator);
- prfield (i, line2);
- }
+ /* Output the join field. */
+ prfield (field, line);
+
+ /* Output other fields. */
+ prfields (line1, join_field_1, line2, join_field_2);
+ prfields (line2, join_field_2, line1, join_field_1);
+
putchar ('\n');
}
}
diff --git a/tests/misc/join b/tests/misc/join
index 3696a03..7ca628b 100755
--- a/tests/misc/join
+++ b/tests/misc/join
@@ -43,7 +43,7 @@ my @tv = (
['1e', '-a2', ["a 1\nb\n", "b\n"], "b\n", 0],
['1f', '-a2', ["b\n", "a\nb\n"], "a\nb\n", 0],
-['2a', '-a1 -e .', ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0],
+['2a', '-a1', ["a\nb\nc\n", "a x y\nb\nc\n"], "a x y\nb\nc\n", 0],
['2b', '-a1 -e . -o 2.1,2.2,2.3', ["a\nb\nc\n", "a x y\nb\nc\n"],
"a x y\nb . .\nc . .\n", 0],
['2c', '-a1 -e . -o 2.1,2.2,2.3', ["a\nb\nc\nd\n", "a x y\nb\nc\n"],
@@ -104,13 +104,13 @@ my @tv = (
["apr 15\naug 20\ndec 18\n", "apr 06\naug 14\ndate\nfeb 15\n"],
"06 apr\n14 aug\n- -\n15 -\n", 0],
-['6a', '-e -',
+['6a', '',
["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"],
"a 1 21\nb 2 22\n", 0],
-['6b', '-a1 -e -',
+['6b', '-a1',
["a 1\nb 2\nd 4\n", "a 21\nb 22\nc 23\nf 26\n"],
"a 1 21\nb 2 22\nd 4\n", 0],
-['6c', '-a1 -e -',
+['6c', '-a1',
["a 21\nb 22\nc 23\nf 26\n", "a 1\nb 2\nd 4\n"],
"a 21 1\nb 22 2\nc 23\nf 26\n", 0],
@@ -127,6 +127,14 @@ my @tv = (
# From David Dyck
['9a', '', [" a 1\n b 2\n", " a Y\n b Z\n"], "a 1 Y\nb 2 Z\n", 0],
+# Test -e without -o (auto pad)
+['10a', '-a1 -a2 -e .',
+ ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"],
+ "a 1 2 3 4\nb 1 . 3 4\nc . . 3 4\nd 1 2 . .\n", 0],
+['10b', '-a1 -a2 -j3 -e .',
+ ["a 1 2\nb 1\nd 1 2\n", "a 3 4\nb 3 4\nc 3 4\n"],
+ "2 a 1 . .\n. b 1 . .\n2 d 1 . .\n4 . . a 3\n4 . . b 3\n4 . . c 3\n"],
+
# From Tim Smithers: fixed in 1.22l
['trailing-sp', '-t: -1 1 -2 1', ["a:x \n", "a:y \n"], "a:x :y \n", 0],
--
1.7.3.4