RE: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

Fikavec F Sun, 05 Mar 2023 14:43:23 -0800

Thanks. In the coming days I will conduct testing and measurements on real hardware.
Unfortunately my code is not ready to become part of the project directly, since this is a very serious place for changes and I am not a Java developer, I am not deeply familiar with the work of internal Solr mechanisms, the code has no tests, it does not support modes and parameters like the original wt=json, and I myself have a number of questions about code, but it would be great if someone from knowledgeable professionals would check my code and prepare a high-quality patch, as previously Mikhail Khludnev helped me here get a patch with a modified buffer. As before, I am happy to take part in testing such a patch, if it appears. All I did was replace SmileResponseWriter with JsonFactory in the source code, as I wrote earlier. I'm not sure that viewing my low-quality code will help professionals more than knowing at which part of the code there is a 4x+ slowdown from possible speeds in order to revise and improve it.

I'm prepared a repository and share the code with the changes made - https://github.com/Fikavec/NewAndModifiedSolrResponseWriters

The first commit with the code of the original SmileResponseWriter so that it would be convenient to see what small changes I made. I placed all jar's from bin folders in ... /solr-8.11.2/server/solr-webapp/webapp/WEB-INF/lib/* and connected them via collection solrconfig.xml:

<queryResponseWriter name="myfastjson" class="my.MyJacksonJsonResponseWriter"></queryResponseWriter>
<queryResponseWriter name="myfastcbor" class="my.MyJacksonCBORResponseWriter"></queryResponseWriter>

Then I created a collection and used them as wt=myfastjson and wt=myfastcbor query parameters.

Please let me know if there are problems in my code, especially the place with utf-8 raises the question, since I do not know in which encoding Solr transmits data to writers, Michael Gibney mentioned that in utf-16 -> utf-8 --> writer, in addition, there are methods writeString and writeRawUTF8String in jackson (https://fasterxml.github.io/jackson-core/javadoc/2.13/com/fasterxml/jackson/core/JsonGenerator.html) which one is needed after Solr passes the data to writer?

Method similar to writeString(String) but that takes as its input a UTF-8 encoded String that is to be output as-is, without additional escaping (type of which depends on data format; backslashes for JSON). However, quoting that data format requires (like double-quotes for JSON) will be added around the value if and as necessary.
Note that some backends may choose not to support this method: for example, if underlying destination is a Writer using this method would require UTF-8 decoding. If so, implementation may instead choose to throw a UnsupportedOperationException due to ineffectiveness of having to decode input.

I checked my code on different utf-8 data, I didn't find any problems, but suddenly I used the wrong function (writeString) and there are cases when the data will be corrupted...

Speeding up the json output would be useful to many people, but I'm not sure about CBOR. It turned out that CBOR is easily added (like other data formats from the fasterxml jackson library https://github.com/FasterXML/jackson#data-format-modules it is possible that csv, xml... will work faster with this library than the current implementation) as ResponseWriter, python is well supported (cbor2 fast) and full data fetching with cursors works 10%-20% faster than fetching data from Solr to python via JSON format (*this means faster in comparison with the modified json serializer on jackson **in python I use orjson library which is faster than a regular json library). I didn't find any very fast smile format python desereliazator, but this does not mean that many people needs CBOR.

At the moment, everything works for me on my collections and their data structures and works very fast. It was surprising to me that the speed of regular json select with gzip has almost doubled, this could potentially lead to upper rps, since at full load individual server responses will return and end faster, I will try to check this too on real hardware using wrk benchmarking tool.

Best Regards,

RE: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

Reply via email to