> I'm not sure there's a shortcut bypassing ordering results through heap
To expand on this a bit: the behavior Mikhail describes changes as of solr 9.1 (https://issues.apache.org/jira/browse/SOLR-14765), which introduces exactly the proposed bypass. The extra overhead (pre-9.1) scales linear wrt overall number of docs in the overall result set (in the *:* case, the number of docs in the index). This could be a contributing factor to the latency you're observing; unfortunately, short of upgrading to 9.1 (or backporting the related patch?), I'm not sure there's any other workaround (because of how deep in the code the result set sorting is). The `sort=_docid_ asc` hack is interesting ... tbh I'm surprised this makes a difference vs constant-score, because having been pretty deep in the sorting code, there's no special-casing of that case, and I'd think it should behave similar to constant score. I suspect this would make a difference relative to main queries that are _not_ constant-score, for cases where you don't care about scoring -- but then I'd expect equivalent behavior to be achievable by wrapping the main query as a constant score (e.g., `q=(some query)^=1.0`). I'm curious (and don't know or don't recall) how this all plays out with the export handler -- I'll take a deeper look at that when I get a chance. (+1 to export handler being much better for the "bulk export" use case). On Sun, Feb 26, 2023 at 2:07 AM Mikhail Khludnev <m...@apache.org> wrote: > > As being said above, the speed of streaming file to socket, via OS > internals is hardly achievable with java code crunching bytes through heap. > Using RamDirectory might push on GC ,it's rather better to stick with the > default one and leave enough RAM for file cache. > Regarding the actual params: q=*%3A*&rows=100000 I'm not sure there's a > shortcut bypassing ordering results through heap (even *:* score is > constant), quite often sort=_docid_ asc allows to avoid sorting, try it. > Also, there is https://solr.apache.org/guide/8_6/exporting-result-sets.html > which might be better suites for straightforward download, don't forget > about sort=_docid_ asc. > > On Sun, Feb 26, 2023 at 1:58 AM Fikavec F <fika...@yandex.ru> wrote: > > > Thanks for the patch for testing. I could not see significant improvements > > on virtual machines, I will try again this week on servers. > > I tried the following values for buffers: 65536 - 64Kb, 262144 - 256Kb, > > 524288 - 512Kb, 1048576 - 1MB, 4194304 - 4MB, 16777216 - 16MB, 33554432 - > > 32Mb, 67108864 - 64Mb, 134217728 - 128MB. I changed the buffer size of Solr > > and Jetty. It was visible in the logs: > > > > 2023-02-25 19:01:42.201 DEBUG (qtp1812823171-22) [ ] > > o.a.s.c.u.FastWriter checking OS env for BUFSIZE => > > java.lang.NumberFormatException: null > > 2023-02-25 19:01:42.201 INFO (qtp1812823171-22) [ ] > > o.a.s.c.u.FastWriter FastWriter.BUFSIZE=4194304 > > > > I noticed that increasing the buffer reduces %wait on the core down to 0 > > and also with 100% loaded core, the speed sometimes increased to 520 > > megabits (I haven't seen such numbers before, but It's still far from > > Gigabit+). Adding ident=false and/or wt=csv increases the speed a bit more > > (+30/50 Mbit and wt=xml slow down -80 Mbit). > > > > What else in the data chain can be a bottleneck? OS and Network (network > > interface and kernel tuned for 10-Gigabit, tested by iperf - ok), disk > > (ramdisk), processor (except that 4.3 GHz core is not enough to transfer > > data from Solr in single thread faster than 0.5 Gigabit) are not a > > bottleneck, jetty is able to distribute a file at high speeds, with large > > buffers I have now received by wget: 2023-02-25 21:51:37 (6.25 Gb/s), > > FastWriter.BUFSIZE now it's big too, what is the next possible bottleneck > > in Solr software architecture to explore and search further? > > > > Thank you for your help, I hope if these are not natural algorithmic > > limitations, we will be able to figure out and make Solr even better, > > especially since with the advent of PCIe 5.0, NVME, DDR5 and Wi-Fi 7 speeds > > close to 10 Gigabit are already commonplacem but many end-user needs still > > dependent on single-threaded/core performance and do not get significant > > benefits from new hardware speeds... > > > > Best Regards, > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org