Obender, I think something in your environment / display environment
might be causing some confusion.

Are you using microsoft windows? If so, please verify that support for
right-to-left languages is enabled [control panel/regional and
language options]. It is possible you are "seeing something different"
because your rendering system is not actually rendering right-to-left
text in right-to-left direction!!!!

Second, Instead of using a debugger, I would recommend using Luke to
look at resulting tokens from your analyzer.

On Mon, Jul 20, 2009 at 2:21 PM, OBender<osya_ben...@hotmail.com> wrote:
> This is how it should be written:
> http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%A2%D6%B6%D7%A8%D6%B6%D7%91+%D7%98%D7%95%D6%B9%D7%91
>
> -----Original Message-----
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Monday, July 20, 2009 2:07 PM
> To: java-user@lucene.apache.org
> Subject: Re: question on custom filter
>
> Obender, This is not true.
> the text you pasted is the following in unicode:
>
> \N{HEBREW LETTER TET}
> \N{HEBREW LETTER VAV}
> \N{HEBREW POINT HOLAM}
> \N{HEBREW LETTER BET}
> \N{SPACE}
> \N{HEBREW LETTER AYIN}
> \N{HEBREW POINT SEGOL}
> \N{HEBREW LETTER RESH}
> \N{HEBREW POINT SEGOL}
> \N{HEBREW LETTER BET}
>
> you can use this utility to see how your text is encoded:
> http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%98%D7%95%D6%B9%D7%91+%D7%A2%D6%B6%D7%A8%D6%B6%D7%91
>
> For more information on directionality in unicode, see
> http://unicode.org/reports/tr9/
>
> On Mon, Jul 20, 2009 at 1:59 PM, OBender<osya_ben...@hotmail.com> wrote:
>> Robert,
>>
>> I'm not sure you are correct on this one.
>>
>> If I have a Hebrew phrase:
>> [טוֹב עֶרֶב]
>> Then first token that filter receives is:
>> [עֶרֶב] (0,5)
>> and the second is:
>> [טוֹב] (6,10)
>> Which means that it counts from right to left (words and indexes).
>>
>> Am I missing something?
>>
>> -----Original Message-----
>> From: Robert Muir [mailto:rcm...@gmail.com]
>> Sent: Monday, July 20, 2009 1:43 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: question on custom filter
>>
>> Obender, I don't think its as difficult as you think. Your filter does
>> not need to be aware of this issue at all.
>>
>> In unicode, right-to-left languages are encoded in the data in logical order.
>> The rendering system is what converts it to display in right-to-left
>> for RTL languages.
>>
>> For example in Arabic, "Robert 1234" displays as روبرت 1234
>> To your computer monitor, this looks like 1, 2, 3, 4, space, teh, reh,
>> beh, waw, reh
>>
>> But the unicode text is reh, waw, beh, reh, teh, space, 1, 2, 3, 4.
>>
>> 2009/7/20 OBender <osya_ben...@hotmail.com>:
>>> Hi All!
>>>
>>>
>>>
>>> Let say I have a filter that produces new tokens based on the original ones.
>>>
>>> How bad will it be if my filter sets the start of each token to 0 and end to
>>> the length of a token?
>>>
>>> An example (based on the phrase "How are you?":
>>>
>>>
>>>
>>> Original token:
>>>
>>> [you?] (8,12)
>>>
>>>
>>>
>>> New tokens:
>>>
>>> [you] (0,3)
>>>
>>> [?] (0,1)
>>>
>>>
>>>
>>> It wouldn't be so hard to calculate the right numbers for left to right
>>> languages and it is a bit more challenging to do it for right to left ones
>>> but for mixed text it is quite hard.
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to