David Smiley, sorry for my terminology, I’m used to calling a full data fetching by small parts from DB table (collection) as "scrolling". Of course, in Solr cursors (cursorMark) are designed for this and I use them. Large "rows" values in my examples (measurements) are needed to show the speed at which data from a 10 GB test collection is transmitted from Solr. When using cursors and passing through a 250+ gigabyte collection, data is transmitted at the same speed as with a single call /select, but with delays between scrolls. In practice, I noticed that regardless of the performance of the hardware (SAS disk vs RAMdisk; 1 Gbps vs 10 Gbps network; CPU with a core frequency of 30% more), the full data fetching time does not change and data transfer speed keep around 350 Megabits. At the same time, if instead of select from Solr, start downloading the file via Solr Jetty from the folder ".../solr-webapp/webapp/test.bin" the speed quickly goes beyond the gigabit. And here it doesn't matter if we take top-X and how big this X is - it still happens 4-8x slower than the capabilities of Jetty and Solr and that's not good. It looks like the slowdown affects everything "responce writes" (json, xml, csv, python) I've tested except javabin, which just demonstrates that a bottleneck is possible not deep in Solr, but somewhere at the level of transformation-data transfer by the "responce writers" (except javabin). I don't know where to look for the problem further than the FastWriter output buffer, but I hope the specialists will succeed.
To the remark of Michael Gibney, I tested the speed of Solr 9.1.1. Strangely, it turned out to be much slower than Solr 8.11.2, even if use /export, DocValues and javabin responce writer:
- 1.50 Gb/s - HANDLER /select; wt=javabin; with stored=true docValues=false field
- 489 Mb/s - HANDLER /select; wt=csv; with stored=true docValues=false field
- 459 Mb/s - HANDLER /select; wt=json; with stored=true docValues=false field
- 433 Mb/s - HANDLER /export; wt=javabin; works only on docValues=true feild
- 194 Mb/s - HANDLER /select; wt=json; with stored=false docValues=true field
All conditions are the same as before, just Solr 9.1.1 is installed on the ram disk (java --version - OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu220.04; OS - Ubuntu 20.04.3 LTS; SOLR_JAVA_MEM="-Xms8g -Xmx8g" running in cloud mode, other - defaults). It is not so difficult to create ram disk and try to repeat the above commands on 127.0.0.1 or at least in the gigabit network to see how far the speed from gigabit in the absence of a bottleneck in the disk or network.
All measurement results:
<<<<----- SOLR 9.1.1 tests ----->>>>---[ /select HANDLER ] ---/select HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.03G in 3m 8s2023-02-26 19:09:45 (459 Mb/s) - ‘/dev/null’ saved [10772687921]/select HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 57s2023-02-26 19:26:55 (1.50 Gb/s) - ‘/dev/null’ saved [10749142324]/select HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 2m 56s2023-02-26 19:30:06 (489 Mb/s) - ‘/dev/null’ saved [10751204971]# 2. Experiments with docValues=true stored=false (for testing /export HANDLER)---[ /select HANDLER ] ---/select HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.03G in 7m 24s2023-02-26 20:35:55 (194 Mb/s) - ‘/dev/null’ saved [10772687921]/select HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 4m 13s2023-02-26 20:45:00 (340 Mb/s) - ‘/dev/null’ saved [10749142324]/select HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 7m 5s2023-02-26 21:25:35 (202 Mb/s) - ‘/dev/null’ saved [10751204971]---[ /export HANDLER ] ---/export HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.01G in 4m 7s2023-02-26 21:32:40 (349 Mb/s) - ‘/dev/null’ saved [10751758398]/export HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.00G in 3m 18s2023-02-26 21:37:38 (433 Mb/s) - ‘/dev/null’ saved [10742601804]/export HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.01G in 3m 59s2023-02-26 21:42:33 (360 Mb/s) - ‘/dev/null’ saved [10751758398]
Best Regards,