Thank you for your help with slow single threaded data receiving from Solr. Today I was able to reach a speed of 3Gigabit+ and got results that may be useful in the future.
I turned out to be wrong in assuming that the main problem is in the FastWriter output buffer, but this was the most obvious thing I found while studying the source code of different Solr Response Writers (they are inherited from him). I can't determine exactly where the problem is, but according to my experiments, it doesn't seem to be too deep, but somewhere in the process of the final transformation-sending data to the user. I would think that the problem is in the library/code that converts data to the output format, but it is very strange that the speed is almost independent (not by orders of magnitude, rather not significantly) on the selected format (csv, json, xml, python...), whether it is the fastest csv or the slowest xml. Perhaps the data is slow to arrive at these functions or is slowly converted/deserialized to them from the internal format, but the difference is up to 8 times slower than it could be.
Just in case, I checked the performance of Streaming Expressions (the speed is less than with /select, and a very long wait for a response from the server before the data transfer begins and a huge memory consumption) and Exporting Result Sets too, but Exporting Result Sets only works with DocValues fields, and are not suitable for general use, since multiple restrictions are imposed on DocValues fields (for example, you cannot use DocValues on solr.TextField fields, only StrField).
Here's what I got:
- 3.66 Gb/s - HANDLER /export; wt=javabin; works only on docValues=true feild
- 2.95 Gb/s - HANDLER /select; wt=javabin; with stored=true docValues=false field
- 1.46 Gb/s - HANDLER /export; wt=json; works only on docValues=true feild
- 455 Mb/s - HANDLER /select; wt=csv; with stored=true docValues=false field
- 361 Mb/s - HANDLER /select; wt=json; with stored=true docValues=false field
In any case, the difference between javabin and other Response Writers is enormous. This gives me hope that the problem is not deep internal and can be fixed in the future. Solr itself can work at 3Gigabit speeds and this is not the limit of its Jetty, i.e. there is a lot to improve.
Isn't there a deserializer of Solr javabin format in python/json - while wt=json is so slow, on the client, the data could be quickly received as javabin and converted into what was needed as MongoDB PyMongo client does with BSON (Binary JSON) format?
All measurement results:
# 1. Experiments with stored=true docValues=false---[ /select HANDLER ] ---/select HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.03G in 3m 59s2023-02-26 15:05:45 (361 Mb/s) - ‘/dev/null’ saved [10772687921]/select HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 29s2023-02-26 15:08:59 (2.95 Gb/s) - ‘/dev/null’ saved [10749142324]/select HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 3m 9s2023-02-26 15:14:24 (455 Mb/s) - ‘/dev/null’ saved [10751204971]# 2. Experiments with docValues=true stored=false (for testing /export HANDLER)---[ /select HANDLER ] ---/select HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.03G in 4m 2s2023-02-26 14:24:21 (357 Mb/s) - ‘/dev/null’ saved [10772687921]/select HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 71s2023-02-26 14:27:15 (1.21 Gb/s) - ‘/dev/null’ saved [10749142324]/select HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --post-data 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' http://192.168.220.135:8983/solr/test_collection/select10.01G in 3m 20s2023-02-26 14:32:39 (430 Mb/s) - ‘/dev/null’ saved [10751204971]---[ /export HANDLER ] ---/export HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.01G in 59s2023-02-26 14:34:35 (1.46 Gb/s) - ‘/dev/null’ saved [10751758398]/export HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.00G in 23s2023-02-26 14:37:07 (3.66 Gb/s) - ‘/dev/null’ saved [10742601804]/export HANDLER wt=csvwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' "http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn"10.01G in 60s2023-02-26 14:39:13 (1.43 Gb/s) - ‘/dev/null’ saved [10751758398]---[ /stream HANDLER ] ---/stream HANDLER wt=jsonwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --header='Content-Type: application/x-www-form-urlencoded' --post-data 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id asc",rows=1000000)' http://192.168.220.135:8983/solr/test_collection/stream?wt=json10.03G in 4m 14s (without very long response waiting, huge memory consumption)2023-02-26 16:56:41 (339 Mb/s) - ‘/dev/null’ saved [10768109490]/stream HANDLER wt=javabinwget --report-speed=bits --server-response -O /dev/null --header='Accept-Encoding: ' --header='Content-Type: application/x-www-form-urlencoded' --post-data 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id asc",rows=1000000)' http://192.168.220.135:8983/solr/test_collection/stream?wt=javabin10.00G in 20s (without very long response waiting and after three run, at first run 27s, 3.15 Gb/s, huge memory consumption)2023-02-26 17:06:08 (4.20 Gb/s) - ‘/dev/null’ saved [10742601786]
Best Regards,