The following issue has been SUBMITTED.
======================================================================
https://austingroupbugs.net/view.php?id=1813
======================================================================
Reported By: kre
Assigned To:
======================================================================
Project: Issue 8 drafts
Issue ID: 1813
Category: Shell and Utilities
Type: Error
Severity: Editorial
Priority: normal
Status: New
Name: Robert Elz
Organization:
User Reference:
Section: XCU 3 / xargs
Page Number: 3600-3603
Line Number: 123177-8 123183 123184-6 123228-32 123233-7 123252-3
123263 123304-6
Final Accepted Text:
======================================================================
Date Submitted: 2024-02-16 14:42 UTC
Last Modified: 2024-02-16 14:42 UTC
======================================================================
Summary: generic xargs description cleanups
Description:
Since "xargs" has "been in the news" recently... And while this issue
is
specified to apply to Issue 8 Draft 4 (which is where the page/line
numbers
are from) I would assume it would eventually be moved to Issue 8, after it
is published, and probably be considered for Issue 8 TC 1.
Most of this is just text that should be worded better, but there are a
few omissions that ought to be included.
Lines 123177-8 - two different issues here. First, arguments are said
to be delimited by an alternation of 3 terms - unquoted <blank>, unescaped
<blank> or newline. The problem is that nothing quoted can be escaped,
and nothing escaped can be quoted, hence any <blank> that appears is
either
an unquoted <blank> or an unescaped <blank> and hence is a delimiter
character
according to that definition.
Second (same lines, but continuing perhaps) we have quoting, and escaping,
but nothing here actually says that the quoting " or ' chharacters, or the
excaping \ character are removed from the arg text - which is what I
assume
should happen, ie: in "abc def'"'ghi\jkl'\ m I would assume the argument
string is to be <b>abc def'ghi\jkl m</b> with the quoting and escaping
characters removed. But nothing says that, one could also read the text
as including those chars in the arg string, with the quoting or escaping
simply avoiding enclosed <blank> characters from being delimiters (and
quoted or escaped quote or escape characters from being quote or escape
characters).
Line 123183 "Any unquoted character" can be escaped, must include
<newline>
as that's certainly a character, and isn't allowed to be quoted
(incidentally
nothing says what is to be done if a <newline> appears after an initial
opening quote before its companion closing quote - that might be intended
to
be treated as an error, or the <newline> might just terminate the quoting,
or the <newline> might just be a single unquoted character (which would be
a delimiter if -0 is not used) in the middle of otherwise quoted text, so
'abc
def'
would be two args, "abc" (and any invisible here following <blank> chars),
and those leading <blank> chars followed by "def". Unlikely, but
possible.
That was a side issue ... for line 123183, the issue is what does an
escaped
<newline> mean - at line 123178 it doesn't say that only non-escaped
<newline>
chars are delimiters, all <newline> characters are,. About all I can see
about escaped <newline> is that the results are unspecified if the eof
string
follows one of those. I might guess that an escaped <newline> is intended
to
be removed from the input (both the escape and the <newline> but at the
minute
I don't see where anything says that.
Lines 123228-32: In the first bullet point, if -s is not specified, then
if number generates a command line longer than LINE_MAX but shorter than
{ARG_MAX}-2048 then it seems to imply that less than number args must be
used, to keep the command line length shorter than LINE_MAX. Why? That
seems like a thinko, lines 123202-3 just say that the default command line
length is at least LINE_MAX - in most cases it will be considerably
larger.
Then, same lines, the second bullet point says that fewer args (that
number)
shall be used if the last iteration has fewer than number operands
remaining
(that makes sense) - but not if there are zero. A strict reading of that
would result in the interpretation that the last invocation in that case
must
be padded to have number args ... (no idea how to accomplish that) - but
that's clearly not what it intended to say. More importantly, if there
are
zero args remaining, the previous iteration wouild have been the last,
this
one would not exist. Normally - there's still the case where the last
iteration is the first iteration, and -r was not given. In that case,
one
would assume that the utility should be run with no args (as it would if
there
was no -n option given) but that doesn't reconcile with the description of
-n.
Lines 123233-7: There should be an XREF to XBD 3.7 attached to "an
affirmative response".
[Aside: I expect this is just because this is how existing implementations
behave, but if xargs is going to be opening /dev/tty to read the response
to the prompt, why is it not writing the trace output and prompt string
to /dev/tty instead of to stderr (which might have been redirected, in
order
to redirect stderr for the utility invocations). Very odd. Note this
would
be just for -p tracing/prompt, if just -t is used, stderr is fine.]
Lines 123252-3: (-t) "Each generated command line shall be written to
standard
error just prior to invocation" - Really? The "command lines" generated
are
actually arg lists for exec, with the args terminated by nul bytes, and
with
no delimiting text of any kind. That is to be written to stderr?
Amazing.
Surely there should be spaces inserted between the args, and as it is a
"command line" one would assume that a newline should also be appended.
But what of <blank> characters that are not separators, should those be
escaped or something, or is the user just supposed to guess? And what
about embedded <newline> characters, which are possible if -0 was used.
Line 123263: "utility" is "The name of the utility to be invoked, found
by search path using the PATH"... Really? That's all that is permitted?
No fully specified paths to the utility, no relative paths to "." (like
"bin/command" - everything must be found by a search of PATH) ??
Lines 123304-6: "If -p is specified, a prompt of the following format
shall be written (in the POSIX locale)" ... What's that supposed to
mean. A stupidly literal reading might assume that the prompt is
somehow intended to be written somewhere into the POSIX locale (whatever
that might mean) but that's clearly not right, there's enough text
elsewhere
to make it clear the output goes to stderr - but what does that
parenthesised
phrase mean? Does it mean this specification only applies if the current
locale is POSIX, and what happens for other locales is unspecified? Or
perhaps it means that the current locale must be switched to POSIX to
write
that string (and then presumably switched back again). In any case it
is not clear.
Desired Action:
For the first, perhaps change the alternation to just two terms, the first
being something like "an unquoted and unescaped <blank>" (and then "or
<newline>" just like now.
For the quoting stuff, explicitly say that the quoting and escaping
characters
do not form part of the argument string (if they're quoted or escaped
themselves
they're not quoting or escaping chars, so nothing really needs to be said
about that).
No idea what escaped <newline> is intended to happen, but I'd guess
something
like "An escape <newline> pair of characters shall be removed from the
input
and not delimit an argument string", assuming that is what is to happen.
Also say something, but here I have no idea what, about what happens if
quoting is ongoing when a <newline> is encountered.
For the paragraph at lines 123184-6, I'd write the paragraph starting at
line 123176 something more like (here I'll abbreviate, not include every
word, but I don't intend to change anything - just type less here!)
If the -0 option is not specified, the application ... are delimited by
a sequence of one or more unquoted and unescaped <blank> characters, or
<newline> characters, adjacent delimiter characters shall be treated as
a single delimiter (not produce empty arguments for the utility). Note
that if the input is not empty and does not end in a <newline> the
behaviour is undefined (because...). Quoting and escaping shall be
interpreted as follows, with any quote or escape characters removed
after they have been processed. (Then the three bullet points, with
added text to explain what is to happen if a <newline> is (seems to be)
quoted.)
For the -n option, change the LINE_MAX in the first bullet point to be
"the default line length as described above", and for the second bullet
point, just say "for the last iteration, if there are fewer than number
operands remaining" - no need to mention the zero case, if there are none
left, then if there has already been an iteration, the previous one was
the last (the one that used the final operand), there is no next one with
zero operands - if there was no previous iteration, then we just do what
we'd do with no -n (which depends upon -r).
For lines 123233-7 add an xref to XBD 3.7. (Sure would be nice to change
the output to go to /dev/tty - maybe that could at least be made an
option?)
For lines 123252 specify the format in which the lines are to be written,
which cannot just be xargs internal form, but I have never used -t, so I
have no idea what is actually done.
For line 123263 - I suspect that the intent is to specify the same rules
for finding the utility as execvp() (or execlp()) uses - perhaps simply
say
that, and xref it. (Not the shell command search rules, they're way too
complex). That section xref's XBD 8 (page 167). The xref to add would
be to page 867 here (in I8 D4).
I can't suggest what to do with that "(in the POSIX locale)" as I have
absolutely no idea what it is intended to mean.
======================================================================
Issue History
Date Modified Username Field Change
======================================================================
2024-02-16 14:42 kre New Issue
2024-02-16 14:42 kre Name => Robert Elz
2024-02-16 14:42 kre Section => XCU 3 / xargs
2024-02-16 14:42 kre Page Number => 3600-3603
2024-02-16 14:42 kre Line Number => 123177-8 123183
123184-6 123228-32 123233-7 123252-3 123263 123304-6
======================================================================