Hi Max,
max ulidtko wrote:
> $ cat < тест
> sh: cannot open те�т: No such file
With Debian dash 0.5.5.1-7.4:
$ dash -c 'cat < тест' 2>&1 |
LC_ALL=C sed -e 's/dash: cannot open \(.*\):.*/\1/' |
xxd
0000000: d182 d0b5 d1d1 820a ........
$ dash -c 'echo тест' | xxd
0000000: d182 d0b5 d181 d182 0a .........
The \x81 is being swallowed up. This is <http://bugs.debian.org/532302>,
fixed by f8231a ([EXPAND] Fix corruption of redirections with byte 0x81,
2010-05-27).
But your question is still interesting from the point of view of
investigation, so let's move on to that.
> The reason is signed overflow. The parser uses syntax tables to
> determine the class to which a given byte (assuming it's a whole
> character) belongs. The lookup is done like this:
> switch(syntax[c]) {
Given confusing code, it is often helpful to learn what the authors
were thinking when it was written:
$ git log -S'switch(syntax[c])' -- src/parser.c
commit 05c1076ba2d1a68fe7f3a5ae618f786b8898d327
Author: Herbert Xu <[email protected]>
Date: Mon Sep 26 18:32:28 2005 +1000
Initial import.
Well, so much for that. Except, does that mean the signed lookup
has been present for five years? So looking at 05c107:src/parser.c,
one is led to wonder how c gets set in the first place.
c = pgetc();
What did pgetc do?
int
pgetc(void)
{
return pgetc_macro();
}
And pgetc_macro?
extern char *parsenextc; /* next character in input
buffer */
[...]
#define pgetc_macro() (--parsenleft >= 0? *parsenextc++ :
preadbuffer())
Sounds unportable --- the signedness depends on the platform. Okay,
so what does syntax[-1] give? 05c107:src/mksyntax.c has some hints:
if (sign)
base += 1 << (nbits - 1);
So syntax starts in the _middle_ of the builtin table when char is
signed.
That code isn't present in current src/mksyntax.c. What gives, one
might wonder?
$ git log -1 -S'base +=' -- src/mksyntax.c
commit d8014392bc291504997c65b3b44a7f21a60b0e07
Author: Herbert Xu <[email protected]>
Date: Sun Apr 23 16:01:05 2006 +1000
[PARSER] Only use signed char for syntax arrays
The existing scheme of using the native char for syntax array indicies
makes cross-compiling difficult. Therefore it makes sense to choose
one specific sign for everyone.
Since signed chars are native to most platforms and i386, it makes more
sense to use that if we are to choose one type for everyone.
Ah.
Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html