Wrong result with index on collation using NUMERIC-SORT
-------------------------------------------------------

                 Key: CORE-5638
                 URL: http://tracker.firebirdsql.org/browse/CORE-5638
             Project: Firebird Core
          Issue Type: Bug
          Components: Charsets/Collation
    Affects Versions: 4.0 Alpha 1, 3.0.2, 3.0.3
         Environment: Windows 7 64
            Reporter: Magnus Johansson


Not all records are retrieved when index on a column with collation created as 
UTF8 with option NUMERIC-SORT is used.

Tested on Windows 7 64 with the 64-bit versions of WI-V3.0.2.32703, 
WI-V3.0.3.32805 and WI-T4.0.0.760
I have downloaded icu4c-52_1-Win64-msvc10.zip from 
http://site.icu-project.org/download and from which I replaced the original 
icu*-files.

Using ISQL I have run this:
======================================
SET NAMES UTF8;
SET SQL DIALECT 3;

CREATE DATABASE 'C:\Data\Test\Collation_UTF8.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 16384
DEFAULT CHARACTER SET UTF8;

CREATE COLLATION UTF_SV_CI
      FOR UTF8
      FROM UNICODE
      CASE INSENSITIVE
     'LOCALE=sv_SE';

CREATE COLLATION UTF_SV_CI_NS
      FOR UTF8
      FROM UNICODE
      CASE INSENSITIVE
     'LOCALE=sv_SE;NUMERIC-SORT=1';

COMMIT;

CREATE TABLE TABLE_T (
    FIELD_CI    VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI,
    FIELD_CI_NS VARCHAR(5) CHARACTER SET UTF8 COLLATE UTF_SV_CI_NS);

COMMIT;

INSERT INTO TABLE_T VALUES ('1',  '1' );
INSERT INTO TABLE_T VALUES ('2',  '2' );
INSERT INTO TABLE_T VALUES ('10', '10');
INSERT INTO TABLE_T VALUES ('20', '20');
INSERT INTO TABLE_T VALUES ('a',  'a' );
INSERT INTO TABLE_T VALUES ('A',  'A' );
INSERT INTO TABLE_T VALUES ('aa', 'aa');
INSERT INTO TABLE_T VALUES ('Aa', 'Aa');
INSERT INTO TABLE_T VALUES ('AA', 'AA');
INSERT INTO TABLE_T VALUES ('b',  'b' );
INSERT INTO TABLE_T VALUES ('B',  'B' );
INSERT INTO TABLE_T VALUES ('o',  'o' );
INSERT INTO TABLE_T VALUES ('O',  'O' );
INSERT INTO TABLE_T VALUES ('z',  'z' );
INSERT INTO TABLE_T VALUES ('Z',  'Z' );
INSERT INTO TABLE_T VALUES ('å',  'å' );
INSERT INTO TABLE_T VALUES ('Å',  'Å' );
INSERT INTO TABLE_T VALUES ('ä',  'ä' );
INSERT INTO TABLE_T VALUES ('Ä',  'Ä' );
INSERT INTO TABLE_T VALUES ('ö',  'ö' );
INSERT INTO TABLE_T VALUES ('Ö',  'Ö' );

COMMIT;

CREATE INDEX IDX_CI    ON TABLE_T (FIELD_CI);
CREATE INDEX IDX_CI_NS ON TABLE_T (FIELD_CI_NS);

COMMIT;
======================================

Still in ISQL I did some queries against field FIELD_CI_NS like below.
The expected results on those queries can be retrieved by replacing filed 
FIELD_CI_NS with FIELD_CI.

SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS
===============
1
2
10
20
a
A
aa
Aa
AA
b
===============
'B' is missing


SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS = 'a';

FIELD_CI_NS
===============
a
===============
'A' is missing


SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS = 'A';

FIELD_CI_NS
===============
A
===============
'a' is missing


SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS IN ('a', 'o', 'ä', 'ö');

FIELD_CI_NS
===============
a
o
ä
ö
===============
'A', 'O', 'Ä', 'Ö' is missing


SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS >= 'a' AND FIELD_CI_NS <= 'a';

FIELD_CI_NS
===============
a
===============
'A' is missing


Note that I have used locale for Swedish (sv_SE) but this not only an issue for 
Swedish.
I have also tested with da_DK, en_US, nn_NO which all seems affected.

For Danish and Norwegian, records containing 'aa', 'Aa', and 'AA' is missing 
for both FIELD_CI and FIELD_CI_NS, with or without any index being used for the 
queries below.

For instance, compare these with the Swedish counterpart above:

Norwegian (nn_NO):
===============
SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS
===============
1
2
10
20
a
A
b
===============
'aa', 'Aa', 'AA', 'B' is missing


Danish (da_DK):
===============
SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS
===============
1
2
10
20
A
a
B
b
===============
'aa', 'Aa', 'AA' is missing but not 'B'
Notice also the order of AaBb here, although I'm not a Danish I think they 
would like it aAbB.


And for completeness, here is US English (en_US):
===============
SELECT FIELD_CI_NS
FROM   TABLE_T
WHERE  FIELD_CI_NS BETWEEN '' AND 'b'
ORDER BY 1 ASC;

FIELD_CI_NS
===============
1
2
10
20
a
A
å
Å
ä
Ä
aa
Aa
AA
b
===============
'B' is missing


Perhaps to much descriptive text but this is my first issue and I wanted to 
explain thoroughly, I would be grateful to be enlightened if there is something 
I should have done in other way.

Regards,
Magnus Johansson

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to