On Fri, May 28, 2010 at 6:07 PM, Koustubha Kale <[email protected]> wrote:
> Hi all,
> On a test server with latest git clone, I have Devanagari data. I am
> using zebra with icu.
> Here is my icu.xml
>
> <icu_chain locale="en_IN.UTF-8">
> <transliterate rule="\'>\ "/>
> <transliterate rule="[:Number:] { '-' > '' "/>
> <transform rule="[:Control:] Any-Remove"/>
> <tokenize rule="l"/>
> <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
> <transform rule="NFD"/>
> <transform rule="[:Nonspacing Mark:] Remove"/>
> <transform rule="NFC"/>
> <display/>
> <casemap rule="l"/>
> </icu_chain>
>
> Still when I search for any of the following words it seems to make no
> differentiation between any of the forms of the word.. They all return
> similar results with sorting order all mixed up. It makes no
> difference if I have turned off QueryAutoTruncate, QueryFuzzy &
> QueryStemming.
>
> कांळे
> काळे
> काळ
> काळो
> काळा
> काळं
> काळी
> काळां
> काळू
> काळु
>
> You can also see same behavior on www.granthalaya.org site which is
> running 3.00.00 with zebra + icu with above icu.xml file.
>
> I have two queries.
> 1) With all QueryAutoTruncate, QueryFuzzy, QueryStemming etc turned
> off shouldn't all the above forms of the word be treated as distinct
> words and NOT appear in other word searches?
> 2) The words are not sorted at all; even when I select Title A-z or
> Title Z-a. The ordering changes but its still a incorrect mixed bag.
>
> hdl_laptop had guided me on irc that I need to look into
> sort-string-utf.chr for the sorting.
> I read through a lot of stuff on icu and zebra but cant make out how
> to achieve this.
> Should I append the correct Devanagari sorting char sequence to the
> following lines in sort-string-utf.chr?
>
> lowercase {0-9}{a-y}üzæäøöå
> uppercase {0-9}{A-Y}ÜZÆÄØÖÅ
>
>
> Please help..
Also I should add that I have tried to remove following lines from
icu.xml and rebuilding the index but it makes no difference.
<transform rule="[:Control:] Any-Remove"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
Regards,
Koustubha Kale
Anant Corporation
Contact Details :
Address : 103, Armaan Residency, R. W Sawant Road, Nr. Golden Dyes
Naka, Thane (w),
Maharashtra, India, Pin : 400601.
TeleFax : +91-22-21720108, +91-22-21720109
Mobile : +919820715876
Website : http://www.anantcorp.com
Blog : http://www.anantcorp.com/blog/?author=2
_______________________________________________
Koha-devel mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel