David Smiley, sorry for my terminology, I’m used to calling a full data fetching by small parts from DB table (collection) as "scrolling". Of course, in Solr cursors (cursorMark) are designed for this and I use them. Large "rows" values in my examples (measurements) are needed to show the speed at which data from a 10 GB test collection is transmitted from Solr. When using cursors and passing through a 250+ gigabyte collection, data is transmitted at the same speed as with a single call /select, but with delays between scrolls. In practice, I noticed that regardless of the performance of the hardware (SAS disk vs RAMdisk; 1 Gbps vs 10 Gbps network; CPU with a core frequency of 30% more), the full data fetching time does not change and data transfer speed keep around 350 Megabits. At the same time, if instead of select from Solr, start downloading the file via Solr Jetty from the folder ".../solr-webapp/webapp/test.bin" the speed quickly goes beyond the gigabit. And here it doesn't matter if we take top-X and how big this X is - it still happens 4-8x slower than the capabilities of Jetty and Solr and that's not good. It looks like the slowdown affects everything "responce writes" (json, xml, csv, python) I've tested except javabin, which just demonstrates that a bottleneck is possible not deep in Solr, but somewhere at the level of transformation-data transfer by the "responce writers" (except javabin). I don't know where to look for the problem further than the FastWriter output buffer, but I hope the specialists will succeed.
 
To the remark of Michael Gibney, I tested the speed of Solr 9.1.1. Strangely, it turned out to be much slower than Solr 8.11.2, even if use /export, DocValues and javabin responce writer:
  • 1.50 Gb/s - HANDLER /select;  wt=javabin; with stored=true docValues=false field
  • 489 Mb/s  - HANDLER /select;  wt=csv;       with stored=true docValues=false field
  • 459 Mb/s  - HANDLER /select;  wt=json;       with stored=true docValues=false field
  • 433 Mb/s  - HANDLER /export;  wt=javabin; works only on docValues=true feild
  • 194 Mb/s  - HANDLER /select;  wt=json;    with stored=false docValues=true field
All conditions are the same as before, just Solr 9.1.1 is installed on the ram disk (java --version - OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu220.04; OS - Ubuntu 20.04.3 LTS; SOLR_JAVA_MEM="-Xms8g -Xmx8g" running in cloud mode, other - defaults). It is not so difficult to create ram disk and try to repeat the above commands on 127.0.0.1 or at least in the gigabit network to see how far the speed from gigabit in the absence of a bottleneck in the disk or network.
 
All measurement results:
 
<<<<----- SOLR 9.1.1 tests ----->>>>
---[ /select HANDLER ] ---
/select HANDLER wt=json
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.03G in 3m 8s
2023-02-26 19:09:45 (459 Mb/s) - ‘/dev/null’ saved [10772687921]
 
/select HANDLER wt=javabin
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.01G in 57s
2023-02-26 19:26:55 (1.50 Gb/s) - ‘/dev/null’ saved [10749142324]
 
/select HANDLER wt=csv
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.01G in 2m 56s
2023-02-26 19:30:06 (489 Mb/s) - ‘/dev/null’ saved [10751204971]
 
# 2. Experiments with docValues=true stored=false (for testing /export HANDLER)
---[ /select HANDLER ] ---
/select HANDLER wt=json
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.03G in 7m 24s
2023-02-26 20:35:55 (194 Mb/s) - ‘/dev/null’ saved [10772687921]
 
/select HANDLER wt=javabin
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.01G in 4m 13s
2023-02-26 20:45:00 (340 Mb/s) - ‘/dev/null’ saved [10749142324]
 
/select HANDLER wt=csv
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select
10.01G in 7m 5s
2023-02-26 21:25:35 (202 Mb/s) - ‘/dev/null’ saved [10751204971]
 
---[ /export HANDLER ] ---
/export HANDLER wt=json
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn"
10.01G in 4m 7s
2023-02-26 21:32:40 (349 Mb/s) - ‘/dev/null’ saved [10751758398]
 
/export HANDLER wt=javabin
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn"
10.00G in 3m 18s
2023-02-26 21:37:38 (433 Mb/s) - ‘/dev/null’ saved [10742601804]
 
/export HANDLER wt=csv
wget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn"
10.01G in 3m 59s
2023-02-26 21:42:33 (360 Mb/s) - ‘/dev/null’ saved [10751758398]
 
Best Regards,
 

Reply via email to