Obender, does the following text appear like the image in the link, or not?
שומר אחי http://farm1.static.flickr.com/3/10445435_75b4546703.jpg?v=0 On Mon, Jul 20, 2009 at 3:34 PM, OBender<[email protected]> wrote: > I've checked, and it appears to be enabled. > > -----Original Message----- > From: Robert Muir [mailto:[email protected]] > Sent: Monday, July 20, 2009 3:18 PM > To: [email protected] > Subject: Re: question on custom filter > > Obender, based on your previous comments (that you see text displayed > in the wrong order), I again recommend that you enable support for RTL > languages in your operating system, as I mentioned earlier... are you > using a Windows-based OS, this is not enabled by default! > > I think you are seeing things in the incorrect order, and this is > causing confusion for you! > > On Mon, Jul 20, 2009 at 3:02 PM, Robert Muir<[email protected]> wrote: >> Obender, i ran your code and it did what I expected (but not what you >> pasted): >> >> First token is: (טוֹב,0,4) >> Second token is: (עֶרֶב,5,10) >> >> I also loaded up your SimpleWhitespaceAnalyzer in Luke, with the same >> results. >> >> On Mon, Jul 20, 2009 at 2:53 PM, OBender<[email protected]> wrote: >>> Here is the simple code. If you run it with English and with Hebrew you >>> will see that in case of English tokens returned from the left of the >>> phrase to the right and with Hebrew from the right to the left. >>> >>> Again I'm talking about tokens not the individual letters here. >>> >>> public class XFilter extends TokenFilter >>> { >>> protected XFilter( TokenStream tokenStream ) { >>> super( tokenStream ); >>> } >>> >>> �...@override >>> public Token next( final Token reusableToken ) throws IOException >>> { >>> Token nextToken = input.next( reusableToken ); >>> System.out.println( nextToken != null? nextToken: "" ); >>> return nextToken; >>> } >>> } >>> >>> public class SimpleWhitespaceAnalyzer extends Analyzer >>> { >>> �...@override >>> public TokenStream tokenStream( final String fieldName, final Reader >>> reader ) >>> { >>> TokenStream ts = new WhitespaceTokenizer( reader ); >>> ts = new XFilter( ts ); >>> >>> return ts; >>> } >>> } >>> >>> -----Original Message----- >>> From: Robert Muir [mailto:[email protected]] >>> Sent: Monday, July 20, 2009 2:26 PM >>> To: [email protected] >>> Subject: Re: question on custom filter >>> >>> Obender, I think something in your environment / display environment >>> might be causing some confusion. >>> >>> Are you using microsoft windows? If so, please verify that support for >>> right-to-left languages is enabled [control panel/regional and >>> language options]. It is possible you are "seeing something different" >>> because your rendering system is not actually rendering right-to-left >>> text in right-to-left direction!!!! >>> >>> Second, Instead of using a debugger, I would recommend using Luke to >>> look at resulting tokens from your analyzer. >>> >>> On Mon, Jul 20, 2009 at 2:21 PM, OBender<[email protected]> wrote: >>>> This is how it should be written: >>>> http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%A2%D6%B6%D7%A8%D6%B6%D7%91+%D7%98%D7%95%D6%B9%D7%91 >>>> >>>> -----Original Message----- >>>> From: Robert Muir [mailto:[email protected]] >>>> Sent: Monday, July 20, 2009 2:07 PM >>>> To: [email protected] >>>> Subject: Re: question on custom filter >>>> >>>> Obender, This is not true. >>>> the text you pasted is the following in unicode: >>>> >>>> \N{HEBREW LETTER TET} >>>> \N{HEBREW LETTER VAV} >>>> \N{HEBREW POINT HOLAM} >>>> \N{HEBREW LETTER BET} >>>> \N{SPACE} >>>> \N{HEBREW LETTER AYIN} >>>> \N{HEBREW POINT SEGOL} >>>> \N{HEBREW LETTER RESH} >>>> \N{HEBREW POINT SEGOL} >>>> \N{HEBREW LETTER BET} >>>> >>>> you can use this utility to see how your text is encoded: >>>> http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%98%D7%95%D6%B9%D7%91+%D7%A2%D6%B6%D7%A8%D6%B6%D7%91 >>>> >>>> For more information on directionality in unicode, see >>>> http://unicode.org/reports/tr9/ >>>> >>>> On Mon, Jul 20, 2009 at 1:59 PM, OBender<[email protected]> wrote: >>>>> Robert, >>>>> >>>>> I'm not sure you are correct on this one. >>>>> >>>>> If I have a Hebrew phrase: >>>>> [טוֹב עֶרֶב] >>>>> Then first token that filter receives is: >>>>> [עֶרֶב] (0,5) >>>>> and the second is: >>>>> [טוֹב] (6,10) >>>>> Which means that it counts from right to left (words and indexes). >>>>> >>>>> Am I missing something? >>>>> >>>>> -----Original Message----- >>>>> From: Robert Muir [mailto:[email protected]] >>>>> Sent: Monday, July 20, 2009 1:43 PM >>>>> To: [email protected] >>>>> Subject: Re: question on custom filter >>>>> >>>>> Obender, I don't think its as difficult as you think. Your filter does >>>>> not need to be aware of this issue at all. >>>>> >>>>> In unicode, right-to-left languages are encoded in the data in logical >>>>> order. >>>>> The rendering system is what converts it to display in right-to-left >>>>> for RTL languages. >>>>> >>>>> For example in Arabic, "Robert 1234" displays as روبرت 1234 >>>>> To your computer monitor, this looks like 1, 2, 3, 4, space, teh, reh, >>>>> beh, waw, reh >>>>> >>>>> But the unicode text is reh, waw, beh, reh, teh, space, 1, 2, 3, 4. >>>>> >>>>> 2009/7/20 OBender <[email protected]>: >>>>>> Hi All! >>>>>> >>>>>> >>>>>> >>>>>> Let say I have a filter that produces new tokens based on the original >>>>>> ones. >>>>>> >>>>>> How bad will it be if my filter sets the start of each token to 0 and >>>>>> end to >>>>>> the length of a token? >>>>>> >>>>>> An example (based on the phrase "How are you?": >>>>>> >>>>>> >>>>>> >>>>>> Original token: >>>>>> >>>>>> [you?] (8,12) >>>>>> >>>>>> >>>>>> >>>>>> New tokens: >>>>>> >>>>>> [you] (0,3) >>>>>> >>>>>> [?] (0,1) >>>>>> >>>>>> >>>>>> >>>>>> It wouldn't be so hard to calculate the right numbers for left to right >>>>>> languages and it is a bit more challenging to do it for right to left >>>>>> ones >>>>>> but for mixed text it is quite hard. >>>>>> >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Robert Muir >>>>> [email protected] >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Robert Muir >>>> [email protected] >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >>> >>> >>> -- >>> Robert Muir >>> [email protected] >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> >> >> -- >> Robert Muir >> [email protected] >> > > > > -- > Robert Muir > [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Robert Muir [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
