Conrad Herrmann created SOLR-4227:
-------------------------------------
Summary: StreamingUpdateSolrServer does not buffer
OutputStreamWriter with BufferedWriter, causing encoding explosion
Key: SOLR-4227
URL: https://issues.apache.org/jira/browse/SOLR-4227
Project: Solr
Issue Type: Improvement
Affects Versions: 3.2
Environment: Java 1.6, Linux. I am running SOLR 3.2, but the code
doesn't seem different in 3.5.
Reporter: Conrad Herrmann
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer line 112 is:
OutputStreamWriter writer = new OutputStreamWriter(out, "UTF-8");
and then we call
req.writeXML( writer );
Because the writer is not buffered, this causes the XML writer to call the
UTF-8 encoder for each atom being written, like in
org.apache.solr.common.util.XML.writeXML:
out.write('<');
This causes the stream encoder to allocate a char array to hold it, and
sun.nio.cs.StreamEncoder.implWrite allocates a CharBuffer to wrap it. All just
for one character.
This is particularly a problem when you have a lot of threads (100?) writing to
the SOLR server, they rapidly eat up all the CPU.
It would be helpful to allocate the writer as a BufferedWriter, so encoding
only happens when you flush. JavaDoc for OutputStreamWriter recommends this:
"For top efficiency, consider wrapping an OutputStreamWriter within a
BufferedWriter so as to avoid frequent converter invocations."
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]