Wrong result with index on collation using NUMERIC-SORT -------------------------------------------------------
Key: CORE-5638 URL: http://tracker.firebirdsql.org/browse/CORE-5638 Project: Firebird Core Issue Type: Bug Components: Charsets/Collation Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.3 Environment: Windows 7 64 Reporter: Magnus Johansson Not all records are retrieved when index on a column with collation created as UTF8 with option NUMERIC-SORT is used. Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703, WI-V3.0.3.32805 and WI-T4.0.0.760 I have downloaded icu4c-52_1-Win64-msvc10.zip from http://site.icu-project.org/download and from which I replaced the original icu*-files. Using ISQL I have run this: ====================================== SET NAMES UTF8; SET SQL DIALECT 3; CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb' USER 'SYSDBA' PASSWORD 'masterkey' PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8; CREATE COLLATION UTF_SV_CI FOR UTF8 FROM UNICODE CASE INSENSITIVE 'LOCALE=sv_SE'; CREATE COLLATION UTF_SV_CI_NS FOR UTF8 FROM UNICODE CASE INSENSITIVE 'LOCALE=sv_SE;NUMERIC-SORT=1'; COMMIT; CREATE TABLE TABLE_T ( FIELD_CI VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI, FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS); COMMIT; INSERT INTO TABLE_T VALUES ('1', '1' ); INSERT INTO TABLE_T VALUES ('2', '2' ); INSERT INTO TABLE_T VALUES ('10', '10'); INSERT INTO TABLE_T VALUES ('20', '20'); INSERT INTO TABLE_T VALUES ('a', 'a' ); INSERT INTO TABLE_T VALUES ('A', 'A' ); INSERT INTO TABLE_T VALUES ('aa', 'aa'); INSERT INTO TABLE_T VALUES ('Aa', 'Aa'); INSERT INTO TABLE_T VALUES ('AA', 'AA'); INSERT INTO TABLE_T VALUES ('b', 'b' ); INSERT INTO TABLE_T VALUES ('B', 'B' ); INSERT INTO TABLE_T VALUES ('o', 'o' ); INSERT INTO TABLE_T VALUES ('O', 'O' ); INSERT INTO TABLE_T VALUES ('z', 'z' ); INSERT INTO TABLE_T VALUES ('Z', 'Z' ); INSERT INTO TABLE_T VALUES ('å', 'å' ); INSERT INTO TABLE_T VALUES ('Å', 'Å' ); INSERT INTO TABLE_T VALUES ('ä', 'ä' ); INSERT INTO TABLE_T VALUES ('Ä', 'Ä' ); INSERT INTO TABLE_T VALUES ('ö', 'ö' ); INSERT INTO TABLE_T VALUES ('Ö', 'Ö' ); COMMIT; CREATE INDEX IDX_CI ON TABLE_T (FIELD_CI); CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS); COMMIT; ====================================== Still in ISQL I did some queries against field FIELD_CI_NS like below. The expected results on those queries can be retrieved by replacing filed FIELD_CI_NS with FIELD_CI. SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS BETWEEN '' AND 'b' ORDER BY 1 ASC; FIELD_CI_NS =============== 1 2 10 20 a A aa Aa AA b =============== 'B' is missing SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS = 'a'; FIELD_CI_NS =============== a =============== 'A' is missing SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS = 'A'; FIELD_CI_NS =============== A =============== 'a' is missing SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS IN ('a', 'o', 'ä', 'ö'); FIELD_CI_NS =============== a o ä ö =============== 'A', 'O', 'Ä', 'Ö' is missing SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a'; FIELD_CI_NS =============== a =============== 'A' is missing Note that I have used locale for Swedish (sv_SE) but this not only an issue for Swedish. I have also tested with da_DK, en_US, nn_NO which all seems affected. For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the queries below. For instance, compare these with the Swedish counterpart above: Norwegian (nn_NO): =============== SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS BETWEEN '' AND 'b' ORDER BY 1 ASC; FIELD_CI_NS =============== 1 2 10 20 a A b =============== 'aa', 'Aa', 'AA', 'B' is missing Danish (da_DK): =============== SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS BETWEEN '' AND 'b' ORDER BY 1 ASC; FIELD_CI_NS =============== 1 2 10 20 A a B b =============== 'aa', 'Aa', 'AA' is missing but not 'B' Notice also the order of AaBb here, although I'm not a Danish I think they would like it aAbB. And for completeness, here is US English (en_US): =============== SELECT FIELD_CI_NS FROM TABLE_T WHERE FIELD_CI_NS BETWEEN '' AND 'b' ORDER BY 1 ASC; FIELD_CI_NS =============== 1 2 10 20 a A å Å ä Ä aa Aa AA b =============== 'B' is missing Perhaps to much descriptive text but this is my first issue and I wanted to explain thoroughly, I would be grateful to be enlightened if there is something I should have done in other way. Regards, Magnus Johansson -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tracker.firebirdsql.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel