On 8/31/06, Martin Bähr <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 31, 2006 at 06:20:32PM +0200, Axel Liljencrantz wrote:
> > It is sometimes easier to rememer the name of a control sequence than
> > it's value.
>
> absolutely, this definetly goes a long way towards not missing verbatim
> input with ctrl-v. thank you.
>
> > * Numbered sequences, like \4, \x3f and \u2026
> > * Numbered byte seqences, like \Xfe (These differ from the above in
> > that they can be used to create bytes which do not exist in the
> > current character set, e.g. values above 127 in an ASCII locale)
>
> this has me confused now, \x3f dies not go above 127?
> what is \xff and how does it differ from \Xff?

\xXX is an ASCII character yes. If you want to go higher, use \uXX, it
allows you to use up to 16 bits. The C standard says this about \xXX
characters:

The value of an octal or  hexadecimal  escape  sequence shall  be  in
the range of representable values for the type unsigned char for an
integer  character  constant,  or  the unsigned  type corresponding to
wchar_t for a wide character constant.

So it should actually be ok to extend this up to 256. The problem is
that ISO C does not specify the character encoding, so this is rarely
portable. I chose Ascii as the encoding for \xXX sequences. It would
be easy to chose Unicode as the encoding and extend this to make \xXX
allow any single-byte unicode sequence.

So the question that remains is what is \XXX, then? The answer is that
\XXX is a raw byte value. If you use a multibyte character set like
UTF-8, there are a great many byte sequences that are illegal. For
example, you can't have a sinlge 0xFF byte by itself. The sequence
\uff (or \xff, if we would allow it) would be translated into a
two-byte sequnece like (11000011 10111111). Now, Imagine you have
mounted a file system that uses Latin-1 for its filenames on a system
using utf-8. A utf-8 aware shell will have major problems reading,
wildcarding, etc. the filenames, since they will contain lot's of
illegal byte sequneces. Fish gets around this by having \XXX sequences
that describe an arbitrary byte value that may be invalid in the
current character set. You can even use wildcards, a single byte of
illegal input will be interpreted as a character that can match '?'.
This difference obviously only matters in multibyte character sets. If
we where to extend \xXX, which might be a good idea, then the two
would be identical in single-byte characte sets like latin-*.

>
> greetings, martin.
> --
> cooperative communication with sTeam      -     caudium, pike, roxen and unix
> offering: programming, training and administration   -  anywhere in the world
> --
> pike programmer   travelling and working in europe             open-steam.org
> unix system-      bahai.or.at                        iaeste.(tuwien.ac|or).at
> administrator     (caudium|gotpike).org                          is.schon.org
> Martin Bähr       http://www.iaeste.or.at/~mbaehr/
>

-- 
Axel

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Fish-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fish-users

Reply via email to