About string comparison standard says:

"When values of unequal length are compared, if the collation for the
comparison has the NO PAD characteristic and the shorter value is equal
to some prefix of the longer value, then the shorter value is considered
less than the longer value. If the collation for the comparison has the
PAD SPACE characteristic, for the purposes of the comparison, the
shorter value is effectively extended to the length of the longer by
concatenation of <space>s on the right."

Now lets see some Firebird examples:


-- Case 1
SQL> select _ascii 'a' < cast('a' || ascii_char(5) as varchar(2)
character set ascii) from rdb$database;

=======
<false>

Case 1 demonstrates that ASCII has PAD SPACE characteristic, otherwise
it should return true.


-- Case 2
SQL> select _ascii 'a ' < cast('a' || ascii_char(5) as varchar(2)
character set ascii) from rdb$database;

=======
<false>

Case 2 is consistent with Case 1. ASCII_CHAR(32) is > ASCII_CHAR(5).


-- Case 3
SQL> select _iso8859_1 'a b' < cast('a' || ascii_char(5) || 'b' as
varchar(3) character set iso8859_1) from rdb$database;

=======
<false>

Case 3 shows that ASCII_CHAR(32) is > ASCII_CHAR(5) in ISO8859_1.


-- Case 4
SQL> select _iso8859_1 'a' < cast('a' || ascii_char(5) as varchar(2)
character set iso8859_1) from rdb$database;

=======
<true>

Case 4 demonstrates that ISO8859_1 has NO PAD characteristic, otherwise
it would return false as Case 1 and would need to be consistent with Case 3.

But note that SHOW COLLATION show it is PAD SPACE.


-- Case 5
SQL> select _iso8859_1 'a ' < cast('a' || ascii_char(5) as varchar(2)
character set iso8859_1) from rdb$database;

=======
<true>

Case 5 is inconsistent with Case 3. Here it appears that ASCII_CHAR(32)
is < ASCII_CHAR(5) in ISO8859_1.

But in fact what it's doing is trimming trailing spaces of the strings
and comparing. This is not what the standard says to do in any case. Am
I wrong?

I was looking in https://github.com/FirebirdSQL/firebird/issues/6839 and
even produced a patch for it, but for me these things must be first
sorted out.

I don't think the intention is that ISO8859_1 be NO PAD. It looks like
problematic and non standard trimming algorithm causing problems.

The example has been with ISO8859_1, but the problem should happen with
most collations. It appears that all "narrow" collations and non-system
(ASCII, UTF8, UNICODE_FSS, NONE) charsets are subject to the problem.

The bug may be a workaround for the problem of #6839 as that bug happens
only with charsets/collations not subject to the problem I'm mentioning
here.


Adriano


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to