Accent insensitive comparison: diacritical letters with DIAGONAL crossing 
stroke pass only test on EQUALITY to their non-accented forms
---------------------------------------------------------------------------------------------------------------------------------------

                 Key: CORE-4739
                 URL: http://tracker.firebirdsql.org/browse/CORE-4739
             Project: Firebird Core
          Issue Type: Bug
          Components: Charsets/Collation
            Reporter: Pavel Zotov
            Priority: Minor
         Attachments: 
diacritical-comparison-of-letters-with-diagonal-stokes.png.zip

The following letters:

Ø = U+00D8 // LATIN CAPITAL LETTER O WITH STROKE' (U+00D8), used in  danish & 
iceland alphabets;
Ð = U+00D0 // LATIN CAPITAL LETTER ETH' (U+00D0), iceland
Ŀ = U+013F // LATIN CAPITAL LETTER L WITH MIDDLE DOT' (U+013F), catalone 
(valencian)
Ł = U+0141 // LATIN CAPITAL LETTER L WITH STROKE' (U+0141), polish

-- can be compared with their non-accented forms  only using '=' or 'is NOT 
distinct from' for getting result TRUE.
Other kinds of comparison: STARTING WITH, LIKE, SIMILAR TO and evaluation of 
result POS() - fails.

Test query:
========

    with recursive
    d as (
        select
         cast( 'ØÐ' || 'Ł' || 'Ŀ' || 
'ĘĄĂÂÎŢŐŰĖÅĽĢÁÉÍÓÚÝÀÈÌÒÙÂÊÎÔÛÃÑÕÄËÏÖÜŸÇŠĄĘŹŻĂŞŢ' as varchar(80) character set 
utf8) s
        ,cast( 'OD' || 'L' || 'L' || 
'EAAAITOUEALGAEIOUYAEIOUAEIOUANOAEIOUYCSAEZZAST' as varchar(80) character set 
utf8) t
        from rdb$database
    )
    ,r as(select 1 i from rdb$database union all select r.i+1 from r where r.i 
< 100)
    ,e as(
        select
             substring(d.s from r.i for 1) c
            ,substring(d.t from r.i for 1) t
        from d join r on r.i <= char_length(d.s)
    )
    ,f as (
        select
             e.c as utf_char
            ,e.t as latin_char
            ,iif( e.c collate co_utf8_ci_ai = e.t, 1, 0 ) equal_test
            ,iif( position(e.t, e.c collate co_utf8_ci_ai) >0 , 1, 0 ) pos_test
            ,iif( e.c collate co_utf8_ci_ai starting with e.t, 1, 0 ) 
start_with_test
            ,iif( e.c collate co_utf8_ci_ai like e.t, 1, 0 ) like_test
            ,iif( e.c collate co_utf8_ci_ai similar to e.t, 1, 0 ) 
similar_to_letter_test
            ,iif( e.c collate co_utf8_ci_ai similar to '[[:ALPHA:]]', 1, 0 ) 
similar_to_alpha_test
        from e
    )
    select *
    from f
    order by equal_test + pos_test + start_with_test + like_test + 
similar_to_letter_test + similar_to_alpha_test
            ,utf_char
    ;

Result that I've got on Windows and Linux can be seen in attach (screenshot).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to