Den 2019-09-25 kl. 18:05, skrev Mark Rotteveel: > On 25-9-2019 10:11, Kjell Rilbe wrote: >> The suggested <binary string literal> could be used to write characters >> using the literal's encoding directly. E.g. För UTF-8 literal, the >> character Ö could be written as '\xC3\xB6', and in an WIN1252 literal it >> could be written as '\xD6'. >> >> Since these kinds of escapes would be a breaking change to how string >> literals are parsed, a solution would have to be found to determine if a >> specific string literal is to be parsed with these kinds of escapes or >> not. A prefix than could be combined with any character set prefix? > > The SQL standard option would be for Firebird to introduce support for > the Unicode character string literal U&'...' UESCAPE '<char>' (where > the default unicode escape is \ if the UESCAPE clause is absent). > > With this syntax, your example would be U&'\C3B6'. > > From 5.3 <literal>: > > <Unicode character string literal> ::= > [ <introducer> <character set specification> ] > U <ampersand> <quote> [ <Unicode representation>... ] <quote> > [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ] > <Unicode escape specifier> > > <Unicode representation> ::= > <character representation> > | <Unicode escape value> > > Relevant syntax rules: > > 2) In a <Unicode character string literal>, the sequence: > <quote> <Unicode representation>... <quote> <separator> <quote> > <Unicode representation>... <quote> > is equivalent to the sequence: > <quote> <Unicode representation>... <Unicode representation>... <quote> > 3) In a <Unicode character string literal>, the introductory 'U' may > be represented either in upper case (as 'U') or in lower case (as 'u'). > [..] > 7) In a <character string literal>, <national character string > literal>, <Unicode character string literal>, or <binary string > literal>, a <separator> shall contain a <newline>. > [..] > 10) In a <Unicode character string literal> that specifies > “<introducer><character set specification>”, there shall be no > <separator> between the <introducer> and the <character set > specification>. > 11) In a <Unicode character string literal>, there shall be no > <separator> between the “U” and the <ampersand> nor between the > <ampersand> and the <quote>. > 12) The character set of a <Unicode character string literal> that > specifies “<introducer><character set specification>” is the character > set specified by the <character set specification>. The character set > of a <Unicode character string literal> that does not specify > “<introducer><character set specification>” is the character set of > the SQL-client module that contains the <Unicode character string > literal>. > 13) A <Unicode character string literal> is equivalent to a <character > string literal> in which every <Unicode escape value> has been > replaced with the equivalent Unicode character. The set of characters > contained > in the <Unicode character string literal> shall be wholly contained in > the character set of the <Unicode character string literal>. > NOTE 139 — The requirement for “wholly contained” applies after the > replacement of <Unicode escape value>s with equivalent Unicode > characters. > (rules 14-19 are also relevant, but apply to all string literals, left > out for brevity) > > From 5.2 <token> and <separator>: > > <Unicode escape specifier> ::= > [ UESCAPE <quote> <Unicode escape character> <quote> ] > > <Unicode escape value> ::= > <Unicode 4 digit escape value> > | <Unicode 6 digit escape value> > | <Unicode character escape value> > > <Unicode 4 digit escape value> ::= > <Unicode escape character> <hexit> <hexit> <hexit> <hexit> > > <Unicode 6 digit escape value> ::= > <Unicode escape character> <plus sign> > <hexit> <hexit> <hexit> <hexit> <hexit> <hexit> > > <Unicode character escape value> ::= > <Unicode escape character> <Unicode escape character> > <Unicode escape character> ::= > > Relevant syntax rules: > > 16) <Unicode escape character> shall be a single character from the > source language character set other than a <hexit>, <plus sign>, > <quote>, <double quote>, or <white space>. > 17) If the source language character set contains <reverse solidus>, > then let DEC be <reverse solidus>; otherwise, let DEC be an > implementation-defined character from the source language character > set that is not a <hexit>, <plus sign>, <quote>, <double quote>, or > <white space>. > 18) If a <Unicode escape specifier> does not contain <Unicode escape > character>, then “UESCAPE <quote>DEC<quote>” is implicit. > 19) In a <Unicode escape value> there shall be no <separator> between > the <Unicode escape character> and the first <hexit>, nor between any > of the <hexit>s. > [..] > 21) <Unicode 4 digit escape value> '<Unicode escape character>xyzw' is > equivalent to the character at the Unicode code point specified by > U+xyzw. > 22) <Unicode 6 digit escape value> '<Unicode escape character>+xyzwrs' > is equivalent to the character at the Unicode code point specified by > U+xyzwrs. > NOTE 132 — The 6-hexit notation is derived by taking the UCS-4 > notation defined in [ISO10646] and removing the leading two hexits, > whose values are always 0 (zero). > 23) <Unicode character escape value> is equivalent to a single > instance of <Unicode escape character>.
Nice! Thanks! This has got my vote. Regards, Kjell
<<attachment: kjell_rilbe.vcf>>
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel