> > Isn't there a deserializer of Solr javabin format in python/json Well, you can try to marry https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html#smile-response-writer + https://github.com/jhosmer/PySmile
On Mon, Feb 27, 2023 at 12:46 AM Fikavec F <fika...@yandex.ru> wrote: > Thank you for your help with slow single threaded data receiving from > Solr. Today I was able to reach a speed of 3Gigabit+ and got results that > may be useful in the future. > > I turned out to be wrong in assuming that the main problem is in the > FastWriter output buffer, but this was the most obvious thing I found while > studying the source code of different Solr Response Writers (they are > inherited from him). I can't determine exactly where the problem is, but > according to my experiments, it doesn't seem to be too deep, but somewhere > in the process of the final transformation-sending data to the user. I > would think that the problem is in the library/code that converts data to > the output format, but it is very strange that the speed is almost > independent (not by orders of magnitude, rather not significantly) on the > selected format (csv, json, xml, python...), whether it is the fastest csv > or the slowest xml. Perhaps the data is slow to arrive at these functions > or is slowly converted/deserialized to them from the internal format, but > the difference is up to 8 times slower than it could be. > > Just in case, I checked the performance of Streaming Expressions > <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/streaming-expressions.html> > (the speed is less than with /select, and a very long wait for a response > from the server before the data transfer begins and a huge memory > consumption) and Exporting Result Sets > <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/exporting-result-sets.html> > too, but Exporting Result Sets only works with DocValues > <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/exporting-result-sets.html#field-requirements> > fields, and are not suitable for general use, since multiple restrictions > <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/docvalues.html#enabling-docvalues> > are imposed on DocValues fields (for example, you cannot use DocValues on > solr.TextField fields, only StrField). > > Here's what I got: > > - 3.66 Gb/s - HANDLER /export; wt=javabin; works only on > docValues=true feild > - *2.95 Gb/s* - *HANDLER /select; wt=javabin; with stored=true > docValues=false field* > - 1.46 Gb/s - HANDLER /export; wt=json; works only on docValues=true > feild > - 455 Mb/s - HANDLER /select; wt=csv; with stored=true > docValues=false field > - *361 Mb/s* *- HANDLER /select; wt=json; with stored=true > docValues=false field* > > In any case, the difference between javabin and other Response Writers is > enormous. This gives me hope that the problem is not deep internal and can > be fixed in the future. Solr itself can work at 3Gigabit speeds and this is > not the limit of its Jetty, i.e. there is a lot to improve. > Isn't there a deserializer of Solr javabin format in python/json - while > wt=json is so slow, on the client, the data could be quickly received as > javabin and converted into what was needed as MongoDB PyMongo client does > with BSON (Binary JSON) format > <https://pymongo.readthedocs.io/en/stable/api/index.html>? > > All measurement results: > > > # 1. Experiments with stored=true docValues=false > ---[ /select HANDLER ] --- > /select HANDLER wt=json > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.03G in *3m 59s* > 2023-02-26 15:05:45 (*361 Mb/s*) - ‘/dev/null’ saved [10772687921] > > /select HANDLER wt=javabin > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.01G in *29s* > 2023-02-26 15:08:59 (*2.95 Gb/s*) - ‘/dev/null’ saved [10749142324] > > /select HANDLER wt=csv > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.01G in *3m 9s* > 2023-02-26 15:14:24 (*455 Mb/s*) - ‘/dev/null’ saved [10751204971] > > # 2. Experiments with docValues=true stored=false (for testing /export > HANDLER) > ---[ /select HANDLER ] --- > /select HANDLER wt=json > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.03G in *4m 2s* > 2023-02-26 14:24:21 (*357 Mb/s*) - ‘/dev/null’ saved [10772687921] > > /select HANDLER wt=javabin > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.01G in *71s* > 2023-02-26 14:27:15 (*1.21 Gb/s*) - ‘/dev/null’ saved [10749142324] > > /select HANDLER wt=csv > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --post-data > 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc' > http://192.168.220.135:8983/solr/test_collection/select > 10.01G in *3m 20s* > 2023-02-26 14:32:39 (*430 Mb/s*) - ‘/dev/null’ saved [10751204971] > > ---[ /export HANDLER ] --- > /export HANDLER wt=json > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' " > http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn > " > 10.01G in *59s* > 2023-02-26 14:34:35 (*1.46 Gb/s*) - ‘/dev/null’ saved [10751758398] > > /export HANDLER wt=javabin > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' " > http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn > " > 10.00G in *23s* > 2023-02-26 14:37:07 (*3.66 Gb/s*) - ‘/dev/null’ saved [10742601804] > > /export HANDLER wt=csv > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' " > http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn > " > 10.01G in *60s* > 2023-02-26 14:39:13 (*1.43 Gb/s*) - ‘/dev/null’ saved [10751758398] > > ---[ /stream HANDLER ] --- > /stream HANDLER wt=json > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --header='Content-Type: > application/x-www-form-urlencoded' --post-data > 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id > asc",rows=1000000)' > http://192.168.220.135:8983/solr/test_collection/stream?wt=json > 10.03G in *4m 14s *(without very long response waiting, huge memory > consumption) > 2023-02-26 16:56:41 (*339 Mb/s*) - ‘/dev/null’ saved [10768109490] > > /stream HANDLER wt=javabin > wget --report-speed=bits --server-response -O /dev/null > --header='Accept-Encoding: ' --header='Content-Type: > application/x-www-form-urlencoded' --post-data > 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id > asc",rows=1000000)' > http://192.168.220.135:8983/solr/test_collection/stream?wt=javabin > 10.00G in *20s* (without very long response waiting and after three run, > at first run 27s, 3.15 Gb/s, huge memory consumption) > 2023-02-26 17:06:08 (*4.20 Gb/s*) - ‘/dev/null’ saved [10742601786] > > > Best Regards, > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!