On Dec 22, 2007 3:38 AM, Dermot <[EMAIL PROTECTED]> wrote: > > > > On 22/12/2007, Jay Savage <[EMAIL PROTECTED]> wrote: > > > > On Dec 20, 2007 3:54 AM, Dr.Ruud <[EMAIL PROTECTED]> wrote: > > > Rob Dixon schreef: > > > > Dr.Ruud wrote: > > > >> Jay Savage schreef: > > > >>> Corin Lawson wrote: > > > > > > I also think you may be confusing logical punctuation with typography > > and character encodings. Any use of two consecutive quotation marks > > is, by definition, a double quote. Whether that is represented by one > > or more characters in a given encoding, and whether the visual and/or > > programatic representation of those marks is, e.g. ',`,<,‘, > > will, of course, depend on your locale and encoding. Some encodings > > and markups do provide shortcuts for common doubblings that one should > > be aware of. For instance, HTML provides characters “ and > > ”, ASCII provides character 0x42, and Perl itself he qq// > > operator. The existence of these typographical and programatic > > conventions and shortcuts, though, doesn't mean that e.g. "``" is in > > any way less of a double quote than e.g. """. This is precisely why > > languages like LaTeX separate out the logical quote from the > > typographical representation. > > I hope I am not putting an size 9s in it here but I want to make sure I am > getting the point correct. > > Is it correct then that 2 x ' is the same as 1 x " when looked from a > pattern-matching point of view? Put another way a single ascii octal value > 42 is the same as 2 ascii values 47 in the context of the perl regex engine? > Or is it the other way round; because it's possible to encode one way or the > other that the encoding dictates what's to be searched for. > > Sorry to labour the point, it just roused my curiosity. > Dp.
No, searching for qq/\x27word\x27/ will require a different regex from searching for qq/\x22\x22word\x22\x22/. I was just defending my decision to refer to \x22\x22 as a "double quote." The existence of 047 is just holdover from the handpress era, when combining frequently-used combinations of characters enabled typesetters to set texts more efficiently. But unfortunately in most fonts, there isn't any way to visually differentiate between a single character 047 and two characters 042, which is a problem. '' and " look exactly the same in most variable-width fonts, which is the point. ASCII 047 is a character. So are utf8 code points 8022 and 8021. "Double quote," though, isn't a character, it's a function. HTH, -- jay -------------------------------------------------- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom!