Hi Prashant,
What is the Unicode code point associated with the 3,4,5 character?
Steve
On 04/22/2008 at 4:45 PM, Prashant Malik wrote:
> Yes the version of lucene and java are exactly the same on
> the different
> machines.
> Infact we unjared lucene and jared it with our jar and are
> running from the
> same nfs mounts on both the machines
>
> Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
>
> also about the actual string u have it right till 2 .
>
> 3,4,5 are a single character
>
> Thx
> PM
>
> On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe
> <[EMAIL PROTECTED]> wrote:
>
> > Hi Prashant,
> >
> > On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
> > > We have been observing the following problem while
> > > tokenizing using lucene's StandardAnalyzer. Tokens that we get is
> > > different on different machines. I am suspecting it has something to
> > > do with the Locale settings on individual machines?
> > >
> > > For example
> > > the word 'CÃ(c)sar' is split as 'CÃ(c)sar' on machine 1
> > >
> > > while it is split into [cã, sar] on machine 2 .
> > >
> > > Could someone please tell me what might be going on?
> >
> > Which version of Lucene are you using? Is it the same on both machines?
> >
> > I ask because Lucene recently switched StandardTokenizer lexer
> > generation from JavaCC to JFlex, for performance reasons (increased
> > throughput).
> >
> > Also, my email viewer displays the word in question as the following
> > sequence of characters:
> >
> > 1. Capital "C"
> > 2. Capital "A" with a tilda ("~") above it
> > 3. Left parenthesis
> > 4. Lowercase "c"
> > 5. Right parenthesis
> > 6. Lowercase "s"
> > 7. Lowercase "a"
> > 8. Lowercase "r"
> >
> > Is this the correct character sequence? (Sometimes UTF-8 can look
> > similar to this when it's interpreted as Latin-1.)
> >
> > Steve
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]