On Fri, May 28, 2010 at 6:07 PM, Koustubha Kale <[email protected]> wrote:
> Hi all,
> On a test server with latest git clone, I have Devanagari data. I am
> using zebra with icu.
> Here is my icu.xml
>
> <icu_chain locale="en_IN.UTF-8">
>   <transliterate rule="\'>\ "/>
>   <transliterate rule="[:Number:] { '-' > '' "/>
>   <transform rule="[:Control:] Any-Remove"/>
>   <tokenize rule="l"/>
>   <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
>   <transform rule="NFD"/>
>   <transform rule="[:Nonspacing Mark:] Remove"/>
>   <transform rule="NFC"/>
>   <display/>
>   <casemap rule="l"/>
> </icu_chain>
>
> Still when I search for any of the following words it seems to make no
> differentiation between any of the forms of the word.. They all return
> similar results with sorting order all mixed up. It makes no
> difference if I have turned off QueryAutoTruncate, QueryFuzzy &
> QueryStemming.
>
> कांळे
> काळे
> काळ
> काळो
> काळा
> काळं
> काळी
> काळां
> काळू
> काळु
>
> You can also see same behavior on www.granthalaya.org site which is
> running 3.00.00 with zebra + icu with above icu.xml file.
>
> I have two queries.
> 1) With all QueryAutoTruncate, QueryFuzzy, QueryStemming etc turned
> off shouldn't all the above forms of the word be treated as distinct
> words and NOT appear in other word searches?
> 2) The words are not sorted at all; even when I select Title A-z or
> Title Z-a. The ordering changes but its still a incorrect mixed bag.
>
> hdl_laptop had guided me on irc that I need to look into
> sort-string-utf.chr for the sorting.
> I read through a lot of stuff on icu and zebra but cant make out how
> to achieve this.
> Should I append the correct Devanagari sorting char sequence to the
> following lines in sort-string-utf.chr?
>
> lowercase {0-9}{a-y}üzæäøöå
> uppercase {0-9}{A-Y}ÜZÆÄØÖÅ
>
>
> Please help..


Also I should add that I have tried to remove following lines from
icu.xml and rebuilding the index but it makes no difference.

<transform rule="[:Control:] Any-Remove"/>
  <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
  <transform rule="NFD"/>
  <transform rule="[:Nonspacing Mark:] Remove"/>
  <transform rule="NFC"/>


Regards,
Koustubha Kale
Anant Corporation

Contact Details :
Address  : 103, Armaan Residency, R. W Sawant Road, Nr. Golden Dyes
Naka, Thane (w),
                Maharashtra, India, Pin : 400601.
TeleFax  : +91-22-21720108, +91-22-21720109
Mobile     : +919820715876
Website  : http://www.anantcorp.com
Blog : http://www.anantcorp.com/blog/?author=2
_______________________________________________
Koha-devel mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel

Reply via email to