I'm going to consider this _without_ looking at the ksh source, because mortals will at most look at documentation (and because documentation should be accurate enough that they shouldn't _have_ to look at source).
My very cursory reading of the man page* is a bit ambiguous whether that should work: A blank is a tab or a space. An identifier is a sequence of letters, digits, or underscores starting with a letter or underscore. Identi- fiers are used as components of variable names. A vname is a sequence of one or more identifiers separated by a . and optionally preceded by a .. Vnames are used as function and variable names. A word is a sequence of characters from the character set defined by the current locale, excluding non-quoted metacharacters. "A blank is a tab or a space" is more restrictive than "A word is a sequence of characters from the character set defined by the current locale, excluding non-quoted meta characters". And if I try a vertical tab, formfeed, or carriage return (all plain ASCII characters classified as white space by isspace(3)) before "done", I get the same error. So it looks like the more restrictive interpretation holds: only tabs and the basic space character are acceptable in the code as white space. Of course, anything should be ok in a quoted string (except whatever closes the quotes); or rather, anything except a null byte, which does NOT work** (ksh isn't perl - the latter goes out of its way to tolerate just about anything). However, I wouldn't do it, even if it should work, because that makes it only work in an appropriate (UTF-8) locale; it would certainly be an error regardless in C locale. If it were me, I would only use anything not sensible in C locale, within a quoted string constant; one does NOT want code that does nasty things depending on what locale is in use. * ${.sh.version} on my Mac is Version AJM 93u+ 2012-08-01, which I gather is reasonably current. :-) ** the following produces an interesting error: 0000000 # ! / b i n / k s h \n \n e c h 0000020 o " \0 t e s t i n g " \n 0000035 $ ./tryme.ksh ./tryme.ksh: syntax error at line 3: `zero byte' unexpected On Tue, Apr 25, 2017 at 8:42 AM, lijo george <george.l...@gmail.com> wrote: > > Thanks for the suggestion Philippe. > But I'm a bit confused though, Isn't "0xe3 0x80 0x80" the UTF-8 > representation of the space character. > > > Thanks, > Lijo > > On Tue, Apr 25, 2017 at 5:49 PM, Philippe Bergheaud < > philippe.berghe...@fr.ibm.com> wrote: > >> > The attached testscript has a leading double byte space separator >> > before the for loop closing "done" keyword. This fails with a syntax >> > error while parsing. >> > >> > Is it a bug or is it expected behaviour? >> > >> > I've tried it with ksh93u+ and ksh93v- versions on a Solaris setup. >> > bash and zsh also fails, hence I'm thinking it might not be a bug, >> > but could someone please confirm this. >> > >> > Here's a sample output. >> > >> > root@S11_3_SRU:~# echo $LANG >> > ja_JP.UTF-8 >> > root@S11_3_SRU:~# cat space.ksh >> > #!/bin/ksh >> > for i in 1 2 >> > do >> > echo $i >> > done # leading double byte space character >> > root@S11_3_SRU:~# od -xc space.ksh >> > 0000000 2321 2f62 696e 2f6b 7368 0a66 6f72 2069 >> > # ! / b i n / k s h \n f o r i >> > 0000020 2069 6e20 3120 320a 646f 0a65 6368 6f20 >> > i n 1 2 \n d o \n e c h o >> > 0000040 2469 0ae3 8080 646f 6e65 0a00 >> > $ i \n 343 200 200 d o n e \n >> You should remove the (invisible) character 0343 (0xe3), before the two >> spaces. >> >> Philippe > > > > _______________________________________________ > ast-users mailing list > ast-users@lists.research.att.com > http://lists.research.att.com/mailman/listinfo/ast-users > >
_______________________________________________ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users