Python does not do Unicode strings natively, you have to do them explicitly.
It is possible that your python receiver is not doing the right thing with
the incoming strings.  Also, Jetty has problems with UTF-8; the Wiki has
more on this.

Lance 

-----Original Message-----
From: Maximilian Hütter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 02, 2007 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching combined English-Japanese index

Yonik Seeley schrieb:
> On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
>> Yonik Seeley schrieb:
>>> On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
>>>> When I search using an English term, I get results but the Japanese 
>>>> is not encoded correctly in the response. (although it is UTF-8 
>>>> encoded)
>>> One quick thing to try is the python writer (wt=python) to see the 
>>> actual unicode values of what you are getting back (since the python 
>>> writer automatically escapes non-ascii).  That can help rule out 
>>> incorrect charset handling by clients.
>>>
>>> -Yonik
>>>
>> Thanks for the tip, it turns out that the unicode values are wrong... 
>> I mean the browser displays correctly what is send. But I don't know 
>> how solr gets these values.
> 
> OK, so they never got into the index correctly.
> The most likely explanation is that the charset wasn't set correctly 
> when the update message was sent to Solr.
> 
> -Yonik
> 
Are you sure, they are wrong in the index? When I use the Lucene Index
Monitor (http://limo.sourceforge.net/) to look at the document in the index
the Japanese is displayed correctly.
I am using Jetty 6.0.1 by the way.

Best regards,

Max

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [EMAIL PROTECTED]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich

Reply via email to