I've originally sent this on nov. 22, but it seems that it did not make it
to the list (I don't see it in the archive). so second try (with minor
modifications):
I stumbled over the following problem with the `read' builtin of ksh93
(Version JM 93u 2011-02-08).
short version:
==============
the `read' built-in is supposed to read a single input line (up to a
linefeed). it does'nt work correctly with very long input lines but rather
returns a truncated string of length 253952 without any notification
(e.g., via return status).
long version:
=============
try this:
1. generate a file `longline.txt' containing some very long string, e.g.
with:
=====CUT=====
#!/bin/ksh
if (($# == 0)); then
imax=262144 # i.e. 2^18
else
imax=$1
fi
print "just a moment..."
for ((i=1;i<$imax;i++)); do buf="${buf}."; done
buf="${buf}+"
print $buf > longline.txt
=====CUT=====
which generates a file with a single line of 262143 repeated `.'
characters plus a single `+' (that is 2^18 chars (plus a \n)).
at this point, `wc longline.txt' yields 262145 chars (=2^18 + 1 due to
trailing \n).
now, in `ksh' from the command line, do
read a < longline.txt
echo $? ### yields `0', i.e. "no error"
echo ${#a} ### yields a string length of 253952
echo $a ### shows that truncation is occurring
253952 thus is the length of the returned truncated string for an actual
line length > 2^18 (including trailing \n). to make it a bit more
confusing, shortening the line by a single character to a length of 262143
characters (i.e. total length 2^18 due to the trailing \n) leads to
correct reading, i.e.
echo ${#a} ###yields string length 2621413 = 2^18 - 1
now, 2621414-253952=8192, so it seems that the input buffer length is 8192
bytes and if the input line length does not exceed the sum of 253952 plus
buffer length, the reading still works as expected.
=====
I've seen this with Version JM 93t+ 2010-06-21 under MacOS as well as
ubuntu and with Version JM 93u 2011-02-08 under MacOS (ubuntu not tested).
I find this behavior nowhere documented and presume it's a bug.
since there seems no way to discriminate whether a truncated or a complete
line has been read (and, if truncated, whether truncation occurred in the
middle of a (IFS separated) field or not), it seems not even possible
to circumvent the problem with successive reads from the same input file
since one does not know how to patch the successively read-in parts
together correctly.
I would appreciate any feedback whether this behavior is known/can be
confirmed (did'nt find anything on the net), whether it's really a bug (or
whether I am making some stupid error), and, ideally, how it can be fixed
upstream
or locally.
thanks,
joerg
ps: in `bash' and `zsh' everything just works as expected.
_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users