On 14-08-2020 13:09, Dimitry Sibiryakov wrote:
Hello All.
What SQL standard is telling about charset introducers (such as
"_utf8 'abc'")?
From my knowledge it can be interpreted in two ways:
1) Following byte sequence has given charset.
2) Following character sequence must be used as a string in given charset.
First option resuts in heavy limitation for query text
transliteration and may end up in single usable case when the literal is
a binary string of hexadecimal form.
Second option makes the introducer a shortcut for CAST('abc' as CHAR
CHARACTER SET utf8) but allow the query text to be freely transliterated
between application and engine.
I have this question because Unicode version of ODBC interface needs
query text in fixed UTF-16 and it result in a transliteration problem.
The SQL standard is not very clear about it to be honest, but how I read
it, the introducer is NOT intended as a form of cast (at least, not with
the 'normal' string literal:
Specifically it says:
"""
16) Case:
a) If a <character set specification> is not specified in a <character
string literal>, then the set of characters contained in the <character
string literal> shall be wholly contained in the character set of the
<SQLclient module definition> that contains the <character string literal>.
b) Otherwise, there shall be no <separator> between the <introducer> and
the <character set specification>, and the set of characters contained
in the <character string literal> shall be wholly contained in the
character set specified by the <character set specification>.
[..]
18) The character set of a <character string literal> is
Case:
a) If the <character string literal> specifies a <character set
specification>, then the character set specified by that <character set
specification>.
b) Otherwise, the character set of the SQL-client module that contains
the <character string literal>.
"""
As I read, the current behaviour of Firebird is correct, it is just damn
awkward to achieve.
For the behaviour you want, Firebird would need to implement Unicode
literals, because in unicode literals, the introducer serves as a form
of cast.
Mark
--
Mark Rotteveel
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel