>
> Isn't there a deserializer of Solr javabin format in python/json

Well, you can try to marry
https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html#smile-response-writer
 + https://github.com/jhosmer/PySmile

On Mon, Feb 27, 2023 at 12:46 AM Fikavec F <fika...@yandex.ru> wrote:

> Thank you for your help with slow single threaded data receiving from
> Solr. Today I was able to reach a speed of 3Gigabit+ and got results that
> may be useful in the future.
>
> I turned out to be wrong in assuming that the main problem is in the
> FastWriter output buffer, but this was the most obvious thing I found while
> studying the source code of different Solr Response Writers (they are
> inherited from him). I can't determine exactly where the problem is, but
> according to my experiments, it doesn't seem to be too deep, but somewhere
> in the process of the final transformation-sending data to the user. I
> would think that the problem is in the library/code that converts data to
> the output format, but it is very strange that the speed is almost
> independent (not by orders of magnitude, rather not significantly) on the
> selected format (csv, json, xml, python...), whether it is the fastest csv
> or the slowest xml. Perhaps the data is slow to arrive at these functions
> or is slowly converted/deserialized to them from the internal format, but
> the difference is up to 8 times slower than it could be.
>
> Just in case, I checked the performance of Streaming Expressions
> <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/streaming-expressions.html>
> (the speed is less than with /select, and a very long wait for a response
> from the server before the data transfer begins and a huge memory
> consumption) and Exporting Result Sets
> <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/exporting-result-sets.html>
> too, but Exporting Result Sets only works with DocValues
> <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/exporting-result-sets.html#field-requirements>
> fields, and are not suitable for general use, since multiple restrictions
> <https://svn.apache.org/repos/infra/sites/solr/guide/8_11/docvalues.html#enabling-docvalues>
> are imposed on DocValues fields (for example, you cannot use DocValues on
> solr.TextField fields, only StrField).
>
> Here's what I got:
>
>    - 3.66 Gb/s - HANDLER /export; wt=javabin; works only on
>    docValues=true feild
>    - *2.95 Gb/s* - *HANDLER /select;  wt=javabin; with stored=true
>    docValues=false field*
>    - 1.46 Gb/s - HANDLER /export; wt=json; works only on docValues=true
>    feild
>    - 455 Mb/s  - HANDLER /select;  wt=csv; with stored=true
>    docValues=false field
>    - *361 Mb/s*  *- HANDLER /select;  wt=json; with stored=true
>    docValues=false field*
>
> In any case, the difference between javabin and other Response Writers is
> enormous. This gives me hope that the problem is not deep internal and can
> be fixed in the future. Solr itself can work at 3Gigabit speeds and this is
> not the limit of its Jetty, i.e. there is a lot to improve.
> Isn't there a deserializer of Solr javabin format in python/json - while
> wt=json is so slow, on the client, the data could be quickly received as
> javabin and converted into what was needed as MongoDB PyMongo client does
> with BSON (Bin­ary JSON) format
> <https://pymongo.readthedocs.io/en/stable/api/index.html>?
>
> All measurement results:
>
>
> # 1. Experiments with stored=true docValues=false
> ---[ /select HANDLER ] ---
> /select HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.03G in *3m 59s*
> 2023-02-26 15:05:45 (*361 Mb/s*) - ‘/dev/null’ saved [10772687921]
>
> /select HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in *29s*
> 2023-02-26 15:08:59 (*2.95 Gb/s*) - ‘/dev/null’ saved [10749142324]
>
> /select HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in *3m 9s*
> 2023-02-26 15:14:24 (*455 Mb/s*) - ‘/dev/null’ saved [10751204971]
>
> # 2. Experiments with docValues=true stored=false (for testing /export
> HANDLER)
> ---[ /select HANDLER ] ---
> /select HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=json&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.03G in *4m 2s*
> 2023-02-26 14:24:21 (*357 Mb/s*) - ‘/dev/null’ saved [10772687921]
>
> /select HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=javabin&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in *71s*
> 2023-02-26 14:27:15 (*1.21 Gb/s*) - ‘/dev/null’ saved [10749142324]
>
> /select HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --post-data
> 'indent=false&wt=csv&q=*%3A*&rows=1000000&sort=id%20asc'
> http://192.168.220.135:8983/solr/test_collection/select
> 10.01G in *3m 20s*
> 2023-02-26 14:32:39 (*430 Mb/s*) - ‘/dev/null’ saved [10751204971]
>
> ---[ /export HANDLER ] ---
> /export HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' "
> http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=json&q=*%3A*&sort=id%20desc&fl=id,text_sn
> "
> 10.01G in *59s*
> 2023-02-26 14:34:35 (*1.46 Gb/s*) - ‘/dev/null’ saved [10751758398]
>
> /export HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' "
> http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=javabin&q=*%3A*&sort=id%20desc&fl=id,text_sn
> "
> 10.00G in *23s*
> 2023-02-26 14:37:07 (*3.66 Gb/s*) - ‘/dev/null’ saved [10742601804]
>
> /export HANDLER wt=csv
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' "
> http://192.168.220.135:8983/solr/test_collection/export?indent=false&wt=csv&q=*%3A*&sort=id%20desc&fl=id,text_sn
> "
> 10.01G in *60s*
> 2023-02-26 14:39:13 (*1.43 Gb/s*) - ‘/dev/null’ saved [10751758398]
>
> ---[ /stream HANDLER ] ---
> /stream HANDLER wt=json
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --header='Content-Type:
> application/x-www-form-urlencoded' --post-data
> 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id
> asc",rows=1000000)'
> http://192.168.220.135:8983/solr/test_collection/stream?wt=json
> 10.03G in *4m 14s *(without very long response waiting, huge memory
> consumption)
> 2023-02-26 16:56:41 (*339 Mb/s*) - ‘/dev/null’ saved [10768109490]
>
> /stream HANDLER wt=javabin
> wget --report-speed=bits --server-response -O /dev/null
> --header='Accept-Encoding: ' --header='Content-Type:
> application/x-www-form-urlencoded' --post-data
> 'expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id
> asc",rows=1000000)'
> http://192.168.220.135:8983/solr/test_collection/stream?wt=javabin
> 10.00G in *20s* (without very long response waiting and after three run,
> at first run 27s, 3.15 Gb/s, huge memory consumption)
> 2023-02-26 17:06:08 (*4.20 Gb/s*) - ‘/dev/null’ saved [10742601786]
>
>
> Best Regards,
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to