Obender, I think something in your environment / display environment might be causing some confusion.
Are you using microsoft windows? If so, please verify that support for right-to-left languages is enabled [control panel/regional and language options]. It is possible you are "seeing something different" because your rendering system is not actually rendering right-to-left text in right-to-left direction!!!! Second, Instead of using a debugger, I would recommend using Luke to look at resulting tokens from your analyzer. On Mon, Jul 20, 2009 at 2:21 PM, OBender<osya_ben...@hotmail.com> wrote: > This is how it should be written: > http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%A2%D6%B6%D7%A8%D6%B6%D7%91+%D7%98%D7%95%D6%B9%D7%91 > > -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Monday, July 20, 2009 2:07 PM > To: java-user@lucene.apache.org > Subject: Re: question on custom filter > > Obender, This is not true. > the text you pasted is the following in unicode: > > \N{HEBREW LETTER TET} > \N{HEBREW LETTER VAV} > \N{HEBREW POINT HOLAM} > \N{HEBREW LETTER BET} > \N{SPACE} > \N{HEBREW LETTER AYIN} > \N{HEBREW POINT SEGOL} > \N{HEBREW LETTER RESH} > \N{HEBREW POINT SEGOL} > \N{HEBREW LETTER BET} > > you can use this utility to see how your text is encoded: > http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%98%D7%95%D6%B9%D7%91+%D7%A2%D6%B6%D7%A8%D6%B6%D7%91 > > For more information on directionality in unicode, see > http://unicode.org/reports/tr9/ > > On Mon, Jul 20, 2009 at 1:59 PM, OBender<osya_ben...@hotmail.com> wrote: >> Robert, >> >> I'm not sure you are correct on this one. >> >> If I have a Hebrew phrase: >> [טוֹב עֶרֶב] >> Then first token that filter receives is: >> [עֶרֶב] (0,5) >> and the second is: >> [טוֹב] (6,10) >> Which means that it counts from right to left (words and indexes). >> >> Am I missing something? >> >> -----Original Message----- >> From: Robert Muir [mailto:rcm...@gmail.com] >> Sent: Monday, July 20, 2009 1:43 PM >> To: java-user@lucene.apache.org >> Subject: Re: question on custom filter >> >> Obender, I don't think its as difficult as you think. Your filter does >> not need to be aware of this issue at all. >> >> In unicode, right-to-left languages are encoded in the data in logical order. >> The rendering system is what converts it to display in right-to-left >> for RTL languages. >> >> For example in Arabic, "Robert 1234" displays as روبرت 1234 >> To your computer monitor, this looks like 1, 2, 3, 4, space, teh, reh, >> beh, waw, reh >> >> But the unicode text is reh, waw, beh, reh, teh, space, 1, 2, 3, 4. >> >> 2009/7/20 OBender <osya_ben...@hotmail.com>: >>> Hi All! >>> >>> >>> >>> Let say I have a filter that produces new tokens based on the original ones. >>> >>> How bad will it be if my filter sets the start of each token to 0 and end to >>> the length of a token? >>> >>> An example (based on the phrase "How are you?": >>> >>> >>> >>> Original token: >>> >>> [you?] (8,12) >>> >>> >>> >>> New tokens: >>> >>> [you] (0,3) >>> >>> [?] (0,1) >>> >>> >>> >>> It wouldn't be so hard to calculate the right numbers for left to right >>> languages and it is a bit more challenging to do it for right to left ones >>> but for mixed text it is quite hard. >>> >>> >>> >>> Thanks. >>> >>> >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org