RE: lucene farsi problem

2008-05-11 Thread Steven A Rowe
Hi Esra, Did you try the new version of the patch? In the latest verson, I have taken the code that was in CollatingRangeQuery and put it into RangeQuery. I also put the same functionality into RangeFilter, and provided code to call it from ConstantScoreRangeQuery and QueryParser. Note that

RE: lucene farsi problem

2008-05-10 Thread esra
Hi Steve, i used the locale as ar and it works fine . again thanks a lot for your help. Esra Steven A Rowe wrote: Hi Esra, On 05/06/2008 at 7:38 AM, esra wrote: i tried the class and it works fine with the locale parameter ar. Cool, I'm glad this addressed your problem! Actually

RE: lucene farsi problem

2008-05-09 Thread Steven A Rowe
Hi Esra, On 05/07/2008 at 11:49 AM, Steven A Rowe wrote: At Chris Hostetter's suggestion, I am rewriting the patch attached to LUCENE-1279, including the following changes: - Merged the contents of the CollatingRangeQuery class into RangeQuery and RangeFilter - Switched the Locale

RE: lucene farsi problem

2008-05-08 Thread Vizzini
Dear Steven Thanks for reply. I've just checked the link and it was working. Anyway, you are right, but my point is to use the correct term for main 3 reasons: 1. Respect the host language, i.e. English 3. Apparently the Islamic regime in Tehran is against the word ‘Persian’, and we as the

Re: lucene farsi problem

2008-05-08 Thread Grant Ingersoll
Point #2 does not belong on this forum. This is a forum for Lucene Java, not for political views. There are plenty of other places for that, so let's close this discussion off on this particular point and simply address the issue at hand with Lucene and LUCENE-1279. Cheers, Grant On

Re: lucene farsi problem

2008-05-07 Thread Vizzini
Sorry for cross posting, but why the word 'Farsi' instead of 'Persian'? No one says Lucnce français or Español, or Deutsch - so why Farsi? Please read the following article, I found it quite enlightening. http://www.cais-soas.com/CAIS/Languages/persian_not_farsi.htm PV -- View this message

RE: lucene farsi problem

2008-05-07 Thread Steven A Rowe
Hi Esra, On 05/06/2008 at 7:38 AM, esra wrote: i tried the class and it works fine with the locale parameter ar. Cool, I'm glad this addressed your problem! Actually we are using fa for farsi and ar for arabic. I have added a little control for the locale parameter in my code and now i can

RE: lucene farsi problem

2008-05-07 Thread Steven A Rowe
Hi PV, On 05/07/2008 at 2:54 AM, PV wrote: Sorry for cross posting, but why the word 'Farsi' instead of 'Persian'? No one says Lucnce français or Español, or Deutsch - so why Farsi? Please read the following article, I found it quite enlightening.

RE: lucene farsi problem

2008-05-06 Thread esra
Hi Steven , Hi Steven, i tried the class and it works fine with the locale parameter ar. Actually we are using fa for farsi and ar for arabic. I have added a little control for the locale parameter in my code and now i can see the correct results. Thank you very much for ypur help. Esra.

RE: lucene farsi problem

2008-05-04 Thread Steven A Rowe
Hi Esra, I have attached a patch to LUCENE-1279 containing a new class: CollatingRangeQuery. The patch also contains a test class: TestCollatingRangeQuery. One of the test methods checks for the Farsi range you were having trouble with. It should be mentioned that according to

RE: lucene farsi problem

2008-05-03 Thread esra
Hi Steven, thanks for your help Esra Steven A Rowe wrote: Hi Esra, I have created an issue for this - see https://issues.apache.org/jira/browse/LUCENE-1279. I'll try to take a crack at a patch this weekend. Steve On 05/02/2008 at 12:55 PM, esra wrote: Hi Steven , yes

RE: lucene farsi problem

2008-05-02 Thread esra
Hi Steven, sorry i made a mistake. unicodes are like this: د=U+62F ژ = U+632 and the first letter of ساب ووفر is س = U+633 you can also check them here :http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html Esra Steven A Rowe wrote: Hi Esra, Going back to the original

RE: lucene farsi problem

2008-05-02 Thread Steven A Rowe
Hi Esra, I still think you're wrong :). On 05/02/2008 at 9:31 AM, esra wrote: ژ = U+632 According to the website you linked to, the above character, which has three dots over it, is named zhe, and its Unicode code point is U+698. (I had to increase the font size to see the three dots.) I

RE: lucene farsi problem

2008-05-02 Thread esra
Hi Steven , yes the correct one is ژ /ze/U+632. my problem is when i do search forد-ژ range. The result is ساب ووفر and this word's first letter is س and it's unicode is U+633 and it is not in the in the [ U+062F - U+0632 ] range. am i wrong? Esra Steven A Rowe wrote: Hi Esra,

RE: lucene farsi problem

2008-05-02 Thread Steven A Rowe
Hi Esra, You are *still* incorrectly referring to the glyph with three dots over it: On 05/02/2008 at 12:18 PM, esra wrote: yes the correct one is ژ /ze/U+632. ژ is *not* ze/U+632 - it is zhe/U+698. Have you increased the font size? Can you see the difference between these two?:

RE: lucene farsi problem

2008-05-02 Thread esra
Hi Steven , yes you are right, sorry i am a bit confused. i checked again and the correct one is zhe/U+698. It seems the word is in the range but my customer says it shouldn't be. I think problem occurs because zhe is a Persian letter outside the Arabic alphabet. In farsi alphabet this

RE: lucene farsi problem

2008-05-02 Thread Steven A Rowe
Hi Esra, I have created an issue for this - see https://issues.apache.org/jira/browse/LUCENE-1279. I'll try to take a crack at a patch this weekend. Steve On 05/02/2008 at 12:55 PM, esra wrote: Hi Steven , yes you are right, sorry i am a bit confused. i checked again and the correct

RE: lucene farsi problem

2008-05-01 Thread esra
Hi Steve, thanks for your reply , i know farsi is written and read right-to-left. i am using RangeOuery class and it's rewrite(IndexReader reader) method decides if the word is in range or not by compareTo method and this decision is made by using unicodes. while searching for د-ژ range the

Re: lucene farsi problem

2008-05-01 Thread esra
Hi, document's encoding is UTF-8. i tried the explain() method and the result for د-ژ range searching is: fieldWeight(keywordIndex:ساب ووÙ�ر in 0), product of: 1.0 = tf(termFreq(keywordIndex:ساب ووÙ�ر)=1) 0.30685282 = idf(docFreq=1) 1.0 = fieldNorm(field=keywordIndex,

RE: lucene farsi problem

2008-05-01 Thread Steven A Rowe
Hi Esra, Going back to the original problem statement, I see something that looks illogical to me - please correct me if I'm wrong: On Apr 30, 2008, at 3:21 AM, esra wrote: i am using lucene's IndexSearcher to search the given xml by keyword which contains farsi information. while searching

Re: lucene farsi problem

2008-05-01 Thread Grant Ingersoll
On May 1, 2008, at 4:36 AM, esra wrote: Hi, document's encoding is UTF-8. i tried the explain() method and the result for د-ژ range searching is: fieldWeight(keywordIndex:ساب ووÙ�ر in 0), product of: 1.0 = tf(termFreq(keywordIndex:ساب ووÙ�ر)=1) 0.30685282 =

Re: lucene farsi problem

2008-04-30 Thread Grant Ingersoll
What Analyzer are you using? You might try looking in Luke to see what is in your index, etc. It also isn't clear to me what your documents look like. As for a Farsi analyzer, I would Google Farsi analyzer Lucene and see if you can find anything. Otherwise, you will have to write your

Re: lucene farsi problem

2008-04-30 Thread esra
Hi, thanks for your reply. I am using StandartAnalyzer now and my xml document is like below: keyword![CDATA[ساب ووفر]]/keyword description![CDATA[یک ووفر که در محفظه ای جدا از سایر درایور ها قرار دارد تا صدایی با باس فوق العاده پایین تولید کند. ]]/description i googled for farsi

Re: lucene farsi problem

2008-04-30 Thread Grant Ingersoll
I am not sure how Standard Analyzer will perform on Farsi. The thing to do now would be to get Luke and have a look at the actual document that matches and see what it's tokens look like. You might also try using the explain() method to see why that document matches. Also, are you sure

RE: lucene farsi problem

2008-04-30 Thread Steven A Rowe
Hi Esra, Caveat: I don't speak, read, write, or dream in Farsi - I just know that it mostly shares its orthography with Arabic, and that they are both written and read right-to-left. How are you constructing the queries? Using QueryParser? If so, then I suspect the problem is that you

RE: lucene farsi problem

2008-04-30 Thread Steven A Rowe
On 04/30/2008 at 12:50 PM, Steven A Rowe wrote: Caveat: I don't speak, read, write, or dream in Farsi - I just know that it mostly shares its orthography with Arabic, and that they are both written and read right-to-left. How are you constructing the queries? Using QueryParser? If so,