[ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881073#action_12881073 ]
Lance Norskog commented on SOLR-1959: ------------------------------------- Demonstrating this bug is rather difficult with encoding-challenged text editors. This test uses the Greek letter sigma, Unicode character 03/A3, defined here: [http://en.wikipedia.org/wiki/Greek_alphabet#Greek_and_Coptic] With the solr/example/exampledocs/post.sh application, index this file: {code:title=sigma.xml|borderStyle=solid} <add> <doc> <field name="id">SP2514N</field> <field name="name">A greek letter: Σ should be a sigma</field> </doc> </add> {code} Do a search with this command: {code} curl "http://localhost:8983/solr/select?q=%ce%a3&indent=on" {code} (Yes, it's C3 and not 03.) Without the patch, search with this text string via solrj: {code:title=search code snippet|borderStyle=solid} String queryString = URLDecoder.decode("%ce%a3", "UTF-8"); CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr"); SolrQuery query = new SolrQuery(); query.setQuery(q); QueryResponse qr = server.query(query, SolrRequest.METHOD.GET); {code} This search will fail, because the HTTP server decodes the %xx characters via ISO-8859-1. Now, change GET to POST. The code will work, because POST explicitly sets UTF-8. This patch does the same default for queries. As I said, seeing the right characters in all of the moving parts is tricky. Tracking all of this is easier with a tcp/ip monitor; I used apache's tcpmon. > SolrJ GET operation does not send correct encoding > -------------------------------------------------- > > Key: SOLR-1959 > URL: https://issues.apache.org/jira/browse/SOLR-1959 > Project: Solr > Issue Type: Bug > Components: clients - java > Affects Versions: 1.4.1, Next > Reporter: Lance Norskog > Attachments: SOLR-1959.patch > > > The SolrJ query operation fails to set the character encoding when doing a > GET. It works when doing a POST. > The problem is that URLs are urlencoded with UTF-8 but the Content-type: > header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" > and that worked. The Content-type header encoding defaults to ISO 8859-1. > The result is that SolrJ queries fail for any search with a character above > 127. The work around is to use a POST query instead of a GET. I have not > searched for other places. So, change: > {code} > QueryResponse qr = CommonsHttpSolrServer.query(query); > {code} > to: > {code} > QueryResponse qr = CommonsHttpSolrServer.query(query, > SolrRequest.METHOD.POST); > {code} > One quirk of this behavior is that url-bashing a query string with an ISO > 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ > request.. It also searches correctly from the admin/index.jsp and > admin/form.jsp pages, because they set the content-type in the FORM > declaration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org