On Fri, Feb 25, 2011 at 9:16 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > > On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling > <bernd.fehl...@uni-bielefeld.de> wrote: > > Hi Yonik, > > > > good point, yes we are using Jetty. > > Do you know if Tomcat has this limitation? > > Tomcat's defaults are worse - you need to configure it to use UTF-8 by > default for URLs. > Once you do, it passes all those tests (last I checked). Those tests > are really about UTF-8 working in GET/POST query arguments. Solr may > still be able to handle indexing and returning full UTF-8, but you > wouldn't be able to query for it w/o using surrogates if you're using > Jetty. > > It would be good to test though - does anyone know how to add a char > above the BMP to utf8-example.xml? >
I tried the following, then tried to search on this character (U+29B05 / UTF8:[f0 a9 ac 85]) with jetty and got no results. I also went to the analysis.jsp as a quick test, and noted that jetty treats it as if it were U+9B05 / UTF8: [e9 ac 85]. Then i searched on 'range' via the admin gui to retrieve this document, and chrome blew up with "This page contains the following errors: error on line 17 at column 306: Encoding error" Didn't try tomcat. Index: utf8-example.xml =================================================================== --- utf8-example.xml (revision 1074125) +++ utf8-example.xml (working copy) @@ -34,6 +34,7 @@ <field name="features">eaiou with umlauts: ëäïöü</field> <field name="features">tag with escaped chars: <nicetag/></field> <field name="features">escaped ampersand: Bonnie & Clyde</field> + <field name="features">full unicode range (supplementary char): 𩬅</field> <field name="price">0</field> <!-- no popularity, get the default from schema.xml --> <field name="inStock">true</field> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org