Wrong result with index on collation using NUMERIC-SORT
-------------------------------------------------------
Key: CORE-5638
URL: http://tracker.firebirdsql.org/browse/CORE-5638
Project: Firebird Core
Issue Type: Bug
Components: Charsets/Collation
Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.3
Environment: Windows 7 64
Reporter: Magnus Johansson
Not all records are retrieved when index on a column with collation created as
UTF8 with option NUMERIC-SORT is used.
Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703,
WI-V3.0.3.32805 and WI-T4.0.0.760
I have downloaded icu4c-52_1-Win64-msvc10.zip from
http://site.icu-project.org/download and from which I replaced the original
icu*-files.
Using ISQL I have run this:
======================================
SET NAMES UTF8;
SET SQL DIALECT 3;
CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;
CREATE COLLATION UTF_SV_CI
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE';
CREATE COLLATION UTF_SV_CI_NS
FOR UTF8
FROM UNICODE
CASE INSENSITIVE
'LOCALE=sv_SE;NUMERIC-SORT=1';
COMMIT;
CREATE TABLE TABLE_T (
FIELD_CI VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI,
FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS);
COMMIT;
INSERT INTO TABLE_T VALUES ('1', '1' );
INSERT INTO TABLE_T VALUES ('2', '2' );
INSERT INTO TABLE_T VALUES ('10', '10');
INSERT INTO TABLE_T VALUES ('20', '20');
INSERT INTO TABLE_T VALUES ('a', 'a' );
INSERT INTO TABLE_T VALUES ('A', 'A' );
INSERT INTO TABLE_T VALUES ('aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('AA', 'AA');
INSERT INTO TABLE_T VALUES ('b', 'b' );
INSERT INTO TABLE_T VALUES ('B', 'B' );
INSERT INTO TABLE_T VALUES ('o', 'o' );
INSERT INTO TABLE_T VALUES ('O', 'O' );
INSERT INTO TABLE_T VALUES ('z', 'z' );
INSERT INTO TABLE_T VALUES ('Z', 'Z' );
INSERT INTO TABLE_T VALUES ('å', 'å' );
INSERT INTO TABLE_T VALUES ('Å', 'Å' );
INSERT INTO TABLE_T VALUES ('ä', 'ä' );
INSERT INTO TABLE_T VALUES ('Ä', 'Ä' );
INSERT INTO TABLE_T VALUES ('ö', 'ö' );
INSERT INTO TABLE_T VALUES ('Ö', 'Ö' );
COMMIT;
CREATE INDEX IDX_CI ON TABLE_T (FIELD_CI);
CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS);
COMMIT;
======================================
Still in ISQL I did some queries against field FIELD_CI_NS like below.
The expected results on those queries can be retrieved by replacing filed
FIELD_CI_NS with FIELD_CI.
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
===============
1
2
10
20
a
A
aa
Aa
AA
b
===============
'B' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'a';
FIELD_CI_NS
===============
a
===============
'A' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS = 'A';
FIELD_CI_NS
===============
A
===============
'a' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS IN ('a', 'o', 'ä', 'ö');
FIELD_CI_NS
===============
a
o
ä
ö
===============
'A', 'O', 'Ä', 'Ö' is missing
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a';
FIELD_CI_NS
===============
a
===============
'A' is missing
Note that I have used locale for Swedish (sv_SE) but this not only an issue for
Swedish.
I have also tested with da_DK, en_US, nn_NO which all seems affected.
For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing
for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the
queries below.
For instance, compare these with the Swedish counterpart above:
Norwegian (nn_NO):
===============
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
===============
1
2
10
20
a
A
b
===============
'aa', 'Aa', 'AA', 'B' is missing
Danish (da_DK):
===============
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
===============
1
2
10
20
A
a
B
b
===============
'aa', 'Aa', 'AA' is missing but not 'B'
Notice also the order of AaBb here, although I'm not a Danish I think they
would like it aAbB.
And for completeness, here is US English (en_US):
===============
SELECT FIELD_CI_NS
FROM TABLE_T
WHERE FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;
FIELD_CI_NS
===============
1
2
10
20
a
A
å
Å
ä
Ä
aa
Aa
AA
b
===============
'B' is missing
Perhaps to much descriptive text but this is my first issue and I wanted to
explain thoroughly, I would be grateful to be enlightened if there is something
I should have done in other way.
Regards,
Magnus Johansson
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel