Continuing my research on the performance of data fetching from Solr, I noticed a significant drop in the transfer rate when the size of stored fields decreased. Below are the results of measuring the data transfer rate (wt=javabin) from a collection of 10 gigabytes in size, but consisting of a different number of documents and the size of the stored text field (ram disk, one shard, the collection documents contain only "id" and "text_sn" - stored unindexed without docValues field):
- 3.48 Gb/s (or 849 doc/s) - collection with 20 479 documents of 512 KB each (512*1024 symbols each)
- 2.22 Gb/s (or 17 340 doc/s) - collection with 654 043 documents of 16 KB each (16*1024 symbols each); for a speed of 3.48 Gb/s it should be 27 187 doc/s
- 1.16 Gb/s (or 72 500 doc/s) - collection with 5 159 740 documents of 2 KB each (2*1024 symbols each); for a speed of 3.48 Gb/s it should be 217 500 doc/s
- 212 Mb/s (or 103 500 doc/s) - collection with 37 153 697 documents of 256 bytes each (256 symbols each); for a speed of 3.48 Gb/s it should be 1 699 218 doc/s
Since the disk or network is not a bottleneck, the CPU is also quite fast (4.5 Ghz), where can I further look for the reason for such a drop in data transfer speed and is there a chance to improve something there?
As far as I understand from the measurement results, per document overhead costs arise somewhere when traversing/iterating through the list of documents transmitted to the javabin output writer, and since the disk is in RAM, these overhead costs are not related to extracting data from the disk itself (there may be expenses for extracting data from the disk, but they should not have such a big effect). I managed to find an article from 2015, which mentions that the problem may be in stored field compression and provides a way to disable it https://stegard.net/2015/05/performance-of-stored-field-compression-in-lucene-4-1/ - is it still relevant (it seems that uncompression 10 Gb of data with larger documents or smaller ones should not affect the speed so significantly, but if instances of the uncompression class and some other entities are created for each document without reuse, this is quite possible)?
Best Regards,