Hi :)
I found the source of the problem. It is indeed the input string. It
comes from a csv export from a relational database. The inputStream of
this csv file was encoded with the wrong charset (ISO8859-1 instead of
CP1252). So the right single quote was returned as this character
corresponding to hex 92 and was indexed as is in Lucene.
The problem was out of the scope of lucene, as Uwe Schindler said :)
Thanks for your help :)
Gary
Le 03/03/2014 18:44, Jack Krupansky a écrit :
What is the hex value for that second character returned that appears
to display as an apostrophe? Hex 92 (decimal 146) is listed as
"Private Use 2", so who knows what it might display as. All that is
important is the binary/hax value.
Out of curiosity, how did your application come about picking a PU
Unicode character?
-- Jack Krupansky
-----Original Message----- From: G.Long
Sent: Monday, March 3, 2014 12:09 PM
To: java-user@lucene.apache.org
Subject: encoding problem when retrieving document field value
Hi :)
My index (Lucene 3.5) contains a field called title. Its value is
indexed (analyzed and stored) with the WhitespaceAnalyzer and can
contains html entities such as ’ or °
My problem is that when i retrieve values from this field, some of the
html entities are missing.
For example :
Luke tells me that the stored value is : "l’application n°
90-1258" and when I retrieve the field value in my application, I get
"l’application n° 90-1258".
The apostrophe is not in the returned value whereas the ° character is
present.
What could be the problem?
Thanks,
Gary
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org