Hi Steven , yes the correct one is "ژ "/"ze"/U+632.
my problem is when i do search for " د-ژ" range. The result is ""ساب ووفر " and this word's first letter is "س " and it's unicode is "U+633" and it is not in the in the [ U+062F - U+0632 ] range. am i wrong? Esra Steven A Rowe wrote: > > Hi Esra, > > I still think you're wrong :). > > On 05/02/2008 at 9:31 AM, esra wrote: >> > ژ = U+632 > > According to the website you linked to, the above character, which has > three dots over it, is named "zhe", and its Unicode code point is U+698. > (I had to increase the font size to see the three dots.) > > I think you are confusing "ژ"/"zhe"/U+698 with the letter "ز"/"ze"/U+632, > which has just one dot over it. > > Unless you were mistaken in all of your emails when you included the > character "ژ"/"zhe" instead of "ز"/"ze", then what I said in my previous > email still stands: there is no problem here. > > Steve > > On 05/02/2008 at 9:31 AM, esra wrote: >> >> Hi Steven, >> >> sorry i made a mistake. unicodes are like this: >> >> > د=U+62F >> > ژ = U+632 >> > and the first letter of "ساب ووفر " is س = U+633 >> >> you can also check them here >> > http://www.unics.uni-hannover.de/nhtcapri/persian-alphabet.html >> >> Esra >> >> >> Steven A Rowe wrote: >> > >> > Hi Esra, >> > >> > Going back to the original problem statement, I see something that >> > looks illogical to me - please correct me if I'm wrong: >> > >> > On Apr 30, 2008, at 3:21 AM, esra wrote: >> > > i am using lucene's "IndexSearcher" to search the given xml by >> > > keyword which contains farsi information. >> > > while searching i use ranges like >> > > >> > > آ-ث | ج-خ | د-ژ | س-ظ | ع-ق | ک-ل | م-ی >> > > >> > > when i do search for "د-ژ" range the results are wrong , they >> > > are the results of " س-ظ "range. >> > > >> > > for example when i do search for "د-ژ" one of the the results is >> > > "ساب ووفر", this result also shown on the " س-ظ " range's result >> > > list which is the corret range. >> > > >> > > As IndexSearcher use "compareTo" method and this method uses >> > > unicodes for comparing, i found the unicodes of the characters. >> > > >> > > د=U+62F >> > > ژ = U+698 >> > > and the first letter of "ساب ووفر " is س = U+633 >> > >> > It appears to me that *both* the "د-ژ" range [ U+062F - U+0698 ] and >> > the "س-ظ" range [ U+0633 - U+0638 ] contain the first letter of "ساب >> > ووفر", which is "س" = U+0633. >> > >> > You stated that U+0633 should be contained in the [ U+0633 - U+0638 ] >> > range - I agree - but why do you think U+0633 should not be contained >> > in the [ U+062F - U+0698 ] range? >> > >> > In other words, it looks to me like your problem is not a problem at >> > all. >> > >> > Steve >> > >> > >> >> -- View this message in context: >> http://www.nabble.com/lucene-farsi-problem-tp16977096p17019498.html Sent >> from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- To >> unsubscribe, e-mail: [EMAIL PROTECTED] For >> additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > > > -- View this message in context: http://www.nabble.com/lucene-farsi-problem-tp16977096p17022861.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]