On 31/03/2026 20:28, Pádraig Brady wrote:
On 31/03/2026 19:28, Bernhard Voelker wrote:
A simple (wrong) command line like
yes /dev/null | wc --files0-from=-
will for sure lead to OOM, because the input is not '\0'-separated.
Other tools like du(1) and sort(1) suffer from the same issue, and for
sure more tools like 'find -files0-from=-' from the findutils are in the
same boat.
The digesting of input file names is done by the gnulib argv-iter module,
which in turn uses getdelim() to read and realloc the memory endlessly.
My point is: for the --files0-from option, we definitely know that a returned
filename can not usefully be longer than PATH_MAX.
Therefore, the memory for parsing the file should never need to grow more
than that, and we could fail early with ENAMETOOLONG, and eventually skip
such an overly long entry in the file list.
Where in the call chain tool -> argv_iter -> getdelim could we make a better cut
to fail/skip early for bad entries longer than PATH_MAX?
The change will definitely be in gnulib, but I wanted first to discuss this
from the utilities' side.
FWIW:
I'm not sure if we have other utilities which digest '\0'-delimited input,
and where we know that a useful iteration item can never be longer than N.
If we find a nice solution, we might be able to make it more generic and
read up to such given N.
Good point.
Without looking it sounds like argv-iter should use getndelim2 not getdelim,
as the former supports specifying a max length.
Now PATH_MAX isn't always defined, but we could use some sensible upper bound
if not.
Just had a quick look at argv-iter, and I think we'd have to extend
the init interface to support a max_item_len parameter,
as it's currently not specific to files.
I suppose you could have a argv_iter_init_files0_stream() that
calls argv_iter_init_stream() but also sets a limit to PATH_MAX.
However I now also see that we still need to discard bytes.
I.e. with getndelim2() we'd be changing an OOM to an infinite loop.
So doing this would only help the case where we have huge items
that eventually terminate. I can't think of a practical case
where that might happen though.
cheers,
Padraig