You can get parallelism by sharding -- use numShards=8 or whatever number of CPUs you have.
In your performance analysis, you did not speak of memory; you spoke as if there are only two factors at play -- CPU and Disk. Compression of data on disk is used in order to allow the OS's disk cache to cache as much as possible. In primary search use-cases (but not return-all-data), RAM matters because the OS uses it to cache disk. Primary search use-cases are very random-IO oriented; not sequential (unlike read-all-data). If you were to manage to turn off stored value compression, I'm sure you would accomplish your current goal but then, I suspect, hurt a primary use-case. To pursue disabling compression further: you could create a Lucene CompressionMode subclass that doesn't actually do any compression. This is actually rather easy to do; I've subclassed this before so I'm familiar with the mechanics of it. There is more integration work to do of course. Ultimately Solr would have to be configured to use a custom Codec. At work, I like to say "we've plugged all the pluggables and created our own" LOL. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Mar 14, 2023 at 4:54 PM Fikavec F <fika...@yandex.ru> wrote: > Thank you for working on the Solr performance issues raised here. > LZ4 is a great solution, but let's look at how things are today. As far > as I understand, uncompressed fields have been abandoned since version 4.1 > (early 2013). At that time, 15,000 RPM SAS disks produced 350 MB/s or > 150-180 iops and had a volume of about 300 GB up to 600 GB. Modern server > PCI-E 5.0 NVME disks promise us 13,000 MB/s (37 times faster than SAS) or > 2,500,000 random read IOPS (13,888 times faster than SAS) and a volume of > up to 15.36 TB (25 times more than SAS). However, the frequency of server > processors has not increased, (for example, Xeon ® E3-1245 v3 2013) they > remained at the level of 2.2-3.40 GHz (although they became multicore and > more efficient). I did not see in htop that all cores were loaded during > data transfer, which means Solr does not decompress LZ4 data into many > streams, which means LZ4 at what speed on one 3.40 GHz processor core in > 2013 in Solr worked, with about the same and works in 2023, but disks > from-for which LZ4 was introduced, they became 37 times more powerful than > SAS, 13888 more powerful than SAS, and 25 times more SAS in volume. In this > scenario, can't LZ4 turn out to be a serious bottleneck of Solr performance > these days, or maybe if everything is tested well, it's time to turn LZ4 > off again (or even disabled by default) or add a parallel operation mode on > several cores to LZ4? > Unfortunately, I couldn't reproduce the code from the article and create a > working collection with compression disabled, I couldn't test anything. It > is also interesting that the fields are stored in LZ4, then decompressed, > recompressed by gzip and sent to the client - perhaps in some cases using > solr they could be stored in gzip to be quickly given directly from the > disk to the client. > > The second interesting question is that I continued to study the effect > of the size of the fields on the data transfer rate. I created a collection > and filled it with "for i in range(40,000,000) insert > {'id':str(i),text_s:str(i)}" documents and found that on my 4.5GHz CPU Solr > can process (return) only about 115 000 documents per second. It doesn't > look fast - there may be a bottleneck in iterating over the set of > documents returned from Solr (for example, if some expensive resource/class > is created somewhere for each returned document instead of being reused)? > > It seems to me that there is a benefit in such discussions and this > does not concern my scenario of using Solr in any way. The equipment has > become faster, its working principle has changed, taking into account the > age of the Solr project, it may well turn out that some basic things should > be reconsidered, rethought, changing to multithreaded or change something > else. I opened the discussion itself due to the fact that my installation > of SOLR worked by almost 90% equally on very old equipment and a new one is > 10 times more powerful. The results achieved so far are already remarkable, > but I think there is still potential for improvement. For example, a simple > code review of a old-written SmileResponseWriter code made it possible to > identify a bug and find a place that potentially negatively affects > performance, this maybe useful for future. > > Best Regards, > --------------------------------------------------------------------- To > unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional > commands, e-mail: dev-h...@solr.apache.org