Hi Esra, On 05/06/2008 at 7:38 AM, esra wrote: > i tried the class and it works fine with the locale parameter "ar".
Cool, I'm glad this addressed your problem! > Actually we are using "fa" for farsi and "ar" for arabic. > I have added a little control for the locale parameter in my > code and now i can see the correct results. From what I could tell, the Collator available for Locale("fa") in the Sun 1.4.2 and 1.5.0 JDKs does not contain Farsi character collation, but the Collator available for Locale("ar") *does* contain Farsi collation. I switched TestCollatingRangeQuery from Locale("fa") to Locale("ar") when I couldn't get the Collator returned for Farsi [ via Collator.getInstance(new Locale("fa") ] to produce correct results. Did you find that Locale("fa") produces the correct results? If so, which VM are you using? At Chris Hostetter's suggestion, I am rewriting the patch attached to LUCENE-1279, including the following changes: - Merged the contents of the CollatingRangeQuery class into RangeQuery and RangeFilter - Switched the Locale parameter to instead take an instance of Collator - Modified QueryParser.jj to construct a QueryParser class that can accept a range collator and pass it either to RangeQuery or through ConstantScoreRangeQuery to RangeFilter. I plan on posting the revised patch in the next day or two. Steve On 05/06/2008 at 7:38 AM, esra wrote: > > Hi Steven , > Hi Steven, > > i tried the class and it works fine with the locale parameter "ar". > > Actually we are using "fa" for farsi and "ar" for arabic. > I have added a little control for the locale parameter in my > code and now i can see the correct results. > > Thank you very much for ypur help. > > Esra. > > Steven A Rowe wrote: > > > > Hi Esra, > > > > I have attached a patch to LUCENE-1279 containing a new class: > > CollatingRangeQuery. > > > > The patch also contains a test class: TestCollatingRangeQuery. One of > > the test methods checks for the Farsi range you were having trouble > > with. > > > > It should be mentioned that according to > > Collator.getAvailableLocales(), neither Java 1.4.2 nor Java 1.5.0 > > contains Farsi collation information. However, in the test class I use > > the Arabic Locale, which seems to properly collate the non-Arabic Farsi > > letter U+0698, and hopefully behaves well with other Farsi letters. If > > you find that this is not the case, you can look into writing collation > > rules using RuleBasedCollator - you should be able to directly specify > > the proper letter orderings for Farsi; CollatingRangeQuery would also > > have to supply a constructor that takes in a Collator instead of a > > Locale. > > > > Please give the class a try and post back about how it works. > > > > Thanks, > > Steve > > > > On 05/03/2008 at 8:33 AM, esra wrote: > > > > > > Hi Steven, > > > > > > thanks for your help.... > > > > > > Esra > > > > > > > > > Steven A Rowe wrote: > > > > > > > > Hi Esra, > > > > > > > > I have created an issue for this - see > > > > <https://issues.apache.org/jira/browse/LUCENE-1279>. > > > > > > > > I'll try to take a crack at a patch this weekend. > > > > > > > > Steve > > > > > > > > On 05/02/2008 at 12:55 PM, esra wrote: > > > > > > > > > > Hi Steven , > > > > > > > > > > yes you are right, sorry i am a bit confused. > > > > > > > > > > i checked again and the correct one is "zhe"/U+698. > > > > > > > > > > It seems the word is in the range but my customer says it > > > > > shouldn't be. > > > > > > > > > > I think problem occurs because "zhe" is a Persian letter > > > > > outside the Arabic > > > > > alphabet. In farsi alphabet this letter is not after the "س" > > > > > letter but it's > > > > > unicode is bigger than "س" letter's and the searcher works > > > > > with unicodes. > > > > > > > > > > Esra > > > > > > > > > > > > > > > Steven A Rowe wrote: > > > > > > > > > > > > Hi Esra, > > > > > > > > > > > > You are *still* incorrectly referring to the glyph with three dots > > > > > > over it: > > > > > > > > > > > > On 05/02/2008 at 12:18 PM, esra wrote: > > > > > > > yes the correct one is "ژ "/"ze"/U+632. > > > > > > > > > > > > "ژ" is *not* "ze"/U+632 - it is "zhe"/U+698. > > > > > > > > > > > > Have you increased the font size? Can you see the difference > > > > > > between these two?: > > > > > > > > > > > > "ژ"/"zhe"/U+698 > > > > > > "ز"/"ze"/U+632 > > > > > > > > > > > > > my problem is when i do search for "د-ژ" range. > The result is > > > "ساب > > > > > > > ووفر" and this word's first letter is "س" and it's unicode is > > > > > > > "U+633" and it is not in the in the [ U+062F - > U+0632 ] range. > > > > > > > > > > > > Like I keep saying, in the above description, you're > using the > > > glyph > > > > > > "ژ"/"zhe"/U+698, while calling at the same time incorrectly > > > > > > referring to it as "ze"/U+632. > > > > > > > > > > > > I don't mean to continually bang on about this - if you're *sure* > > > > > > that when you search, you're using the character represented by the > > > > > > glyph with one dot (and not three), i.e. "ز"/"ze"/U+632, then the > > > > > > problem lies elsewhere. > > > > > > > > > > > > Steve > > > > > > > > > > > > On 05/02/2008 at 12:18 PM, esra wrote: > > > > > > > yes the correct one is "ژ "/"ze"/U+632. > > > > > > > > > > > > > > my problem is when i do search for " د-ژ" range. The result is > > > > > > > ""ساب ووفر " and this word's first letter is "س " and it's unicode > > > > > > > is "U+633" and it is not in the in the [ U+062F - U+0632 ] > > > > > > > range. > > > > > > > > > > > > > > am i wrong? > > > > > > > > > > > > > > Esra > > > > > > > > > > > > > > Steven A Rowe wrote: > > > > > > > > > > > > > > > > Hi Esra, > > > > > > > > > > > > > > > > I still think you're wrong :). > > > > > > > > > > > > > > > > On 05/02/2008 at 9:31 AM, esra wrote: > > > > > > > > > > ژ = U+632 > > > > > > > > > > > > > > > > According to the website you linked to, the > above character, > > > which > > > > > > > > has three dots over it, is named "zhe", and its > > > Unicode code point > > > > > is > > > > > > > > U+698. (I had to increase the font size to see the three dots.) > > > > > > > > > > > > > > > > I think you are confusing "ژ"/"zhe"/U+698 with the letter > > > > > > > > "ز"/"ze"/U+632, which has just one dot over it. > > > > > > > > > > > > > > > > Unless you were mistaken in all of your emails when > > > you included > > > > > the > > > > > > > > character "ژ"/"zhe" instead of "ز"/"ze", then what I said in my > > > > > > > > previous email still stands: there is no problem here. > > > > > > > > > > > > > > > > Steve > > > > > > > > > > > > > > > > On 05/02/2008 at 9:31 AM, esra wrote: > > > > > > > > > > > > > > > > > > Hi Steven, > > > > > > > > > > > > > > > > > > sorry i made a mistake. unicodes are like this: > > > > > > > > > > > > > > > > > > > د=U+62F > > > > > > > > > > ژ = U+632 > > > > > > > > > > and the first letter of "ساب ووفر " is س = U+633 > > > > > > > > > > > > > > > > > > you can also check them here > > > > > > > > > > > > > > > http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html > > > > > > > > > > > > > > > > > > Esra > > > > > > > > > > > > > > > > > > > > > > > > > > > Steven A Rowe wrote: > > > > > > > > > > > > > > > > > > > > Hi Esra, > > > > > > > > > > > > > > > > > > > > Going back to the original problem statement, I > > > see something > > > > > that > > > > > > > > > > looks illogical to me - please correct me if I'm wrong: > > > > > > > > > > > > > > > > > > > > On Apr 30, 2008, at 3:21 AM, esra wrote: > > > > > > > > > > > i am using lucene's "IndexSearcher" to search > > > the given xml > > > > > by > > > > > > > > > > > keyword which contains farsi information. > > > while searching i > > > > > use > > > > > > > > > > > ranges like > > > > > > > > > > > > > > > > > > > > > > آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی > > > > > > > > > > > > > > > > > > > > > > when i do search for "د-ژ" range the results > > > are wrong , > > > > > they > > > > > > > > > > > are the results of " س-ظ "range. > > > > > > > > > > > > > > > > > > > > > > for example when i do search for "د-ژ" > one of the the > > > results > > > > > > > > > > > is "ساب ووفر", this result also shown on the " > > > س-ظ " range's > > > > > result > > > > > > > > > > > list which is the corret range. > > > > > > > > > > > > > > > > > > > > > > As IndexSearcher use "compareTo" method > and this method > > > uses > > > > > > > > > > > unicodes for comparing, i found the unicodes of the > > > > > > > > > > > characters. > > > > > > > > > > > > > > > > > > > > > > د=U+62F > > > > > > > > > > > ژ = U+698 > > > > > > > > > > > and the first letter of "ساب ووفر " is س = U+633 > > > > > > > > > > > > > > > > > > > > It appears to me that *both* the "د-ژ" range [ > > > > > U+062F - U+0698 ] > > > > > > > and > > > > > > > > > > the "س-ظ" range [ U+0633 - U+0638 ] contain the > > > > > first letter of > > > > > > > "ساب > > > > > > > > > > ووفر", which is "س" = U+0633. > > > > > > > > > > > > > > > > > > > > You stated that U+0633 should be contained in the [ > > > > > U+0633 - U+0638 > > > > > > > ] > > > > > > > > > > range - I agree - but why do you think U+0633 should not be > > > > > > > > > > contained in the [ U+062F - U+0698 ] range? > > > > > > > > > > > > > > > > > > > > In other words, it looks to me like your problem is > > > > > not a problem > > > > > > > at > > > > > > > > > > all. > > > > > > > > > > > > > > > > > > > > Steve > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- View this message in context: > > > > > > > > > > > > > > > > > > > > > > > http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498 > > > > > > > .html Sent > > > > > > > > from the Lucene - Java Users mailing list archive at Nabble.com. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To > > > > > > > > unsubscribe, e-mail: > [EMAIL PROTECTED] > > > For > > > > > > > > additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- View this message in context: > > > > > > > > > http://www.nabble.com/lucene-farsi-problem-tp16977096p17022861.html > > > > > > Sent from the Lucene - Java Users mailing list archive at > > > > > Nabble.com. > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > For additional commands, e-mail: > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- View this message in context: > > > > > > > > http://www.nabble.com/lucene-farsi-problem-tp16977096p17023557 > > > .html Sent > > > > from the Lucene - Java Users mailing list archive at Nabble.com. > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To > > > > unsubscribe, e-mail: [EMAIL PROTECTED] For > > > > additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- View this message in context: > > http://www.nabble.com/lucene-farsi-problem-tp16977096p17034715.html > > Sent from the Lucene - Java Users mailing list archive at > Nabble.com. > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > -- View this message in context: > http://www.nabble.com/lucene-farsi-problem-tp16977096p17080852.html Sent > from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For > additional commands, e-mail: [EMAIL PROTECTED] > >