Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

Ishan Chattopadhyaya Sun, 12 Mar 2023 20:51:32 -0700

> Again, it's worth being aware that what you are doing is very far afield
> from what a search engine is *for*.  So yeah... performance may not be so
> great.  Solr users want top-X documents sorted by something, and/or maybe
> some facets/stats summarizing fields.  Not all docs.


Optimizing known inefficiencies in one part might help speed up other
parts. For example, the JSON writer fix will help Fikavec's usecase as well
as more regular Solr usecases.

Having said that, there are various situations when large result sets are
exported, and Fikavec's research on the performance of data fetching from
Solr might prove helpful in speeding up those cases.

On Mon, 13 Mar 2023 at 06:46, David Smiley <dsmi...@apache.org> wrote:

> There is compression of stored data; I don't think it makes sense to
> disable it.  The default compression is LZ4 which is the "BEST_SPEED"
> option offered by Lucene compared to others.  Back in 2015, the article you
> quoted, this faster option wasn't available.  I don't see a no-compression
> option:
>
> https://lucene.apache.org/core/9_3_0/core/org/apache/lucene/codecs/StoredFieldsFormat.html
>
> Make sure you're returning documents in the same order that Lucene/Solr has
> it internally.  By default, if you aren't specifying any sort options, I
> believe Solr will return the documents in this order, but it's worth
> double-checking,  If you specify fl=[docid] check that the results show an
> increasing number for each document.
>
> Again, it's worth being aware that what you are doing is very far afield
> from what a search engine is *for*.  So yeah... performance may not be so
> great.  Solr users want top-X documents sorted by something, and/or maybe
> some facets/stats summarizing fields.  Not all docs.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Mar 12, 2023 at 5:57 PM Fikavec F <fika...@yandex.ru> wrote:
>
> >    Continuing my research on the performance of data fetching from Solr,
> I
> > noticed a significant drop in the transfer rate when the size of stored
> > fields decreased. Below are the results of measuring the data transfer
> rate
> > (wt=javabin) from a collection of 10 gigabytes in size, but consisting
> of a
> > different number of documents and the size of the stored text field (ram
> > disk, one shard, the collection documents contain only "id" and
> "text_sn" -
> > stored unindexed without docValues field):
> >
> >    - 3.48 Gb/s (or       849 doc/s)  -  collection with         20 479
> >    documents of 512 KB each (512*1024 symbols each)
> >    - 2.22 Gb/s (or  17 340 doc/s)  -  collection with       654 043
> >    documents of 16 KB each (16*1024 symbols each); for a speed of 3.48
> Gb/s it
> >    should be 27 187 doc/s
> >    - 1.16 Gb/s (or  72 500 doc/s)  -  collection with    5 159 740
> >    documents of 2 KB each (2*1024 symbols each); for a speed of 3.48
> Gb/s it
> >    should be 217 500 doc/s
> >    - 212 Mb/s  (or 103 500 doc/s) -  collection with  37 153 697
> >    documents of 256 bytes each (256 symbols each); for a speed of 3.48
> Gb/s it
> >    should be  1 699 218 doc/s
> >
> >    Since the disk or network is not a bottleneck, the CPU is also quite
> > fast (4.5 Ghz), where can I further look for the reason for such a drop
> in
> > data transfer speed and is there a chance to improve something there?
> >    As far as I understand from the measurement results, per document
> > overhead costs arise somewhere when traversing/iterating through the list
> > of documents transmitted to the javabin output writer, and since the disk
> > is in RAM, these overhead costs are not related to extracting data from
> the
> > disk itself (there may be expenses for extracting data from the disk, but
> > they should not have such a big effect). I managed to find an article
> from
> > 2015, which mentions that the problem may be in stored field compression
> > and provides a way to disable it
> >
> https://stegard.net/2015/05/performance-of-stored-field-compression-in-lucene-4-1/
> > - is it still relevant (it seems that uncompression 10 Gb of data with
> > larger documents or smaller ones should not affect the speed so
> > significantly, but if instances of the uncompression class and some other
> > entities are created for each document without reuse, this is quite
> > possible)?
> > Best Regards,
> > --------------------------------------------------------------------- To
> > unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional
> > commands, e-mail: dev-h...@solr.apache.org
>

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

Reply via email to