I usually look for low-effort changes to test assumptions and tug on the
tangle from different directions when I'm stuck. In that spirit:
> The queries gathering documents from the source are faster with a filter in
> place, so they feed data to the queue faster. I think this is probably
> because it is only sorting a few million documents for each document batch
> instead of the full 30 million.
If you can turn off the sorting for a test, that would prove the
assumption.
I just looked up what SolrJ does - no insights, but now I'm curious how
well the parallel indexing performs compared to postgres'
single-threaded index builds. I wonder if it could speed up your bigger
query's sort time.
Bill
--
Phobrain.com
On 2023-05-31 19:58, Shawn Heisey wrote:
> On 5/31/23 17:48, Bill Ross wrote:
>
>> Can you swap in another httpclient to test? I assume swapping jetty server
>> would be too much, given something works. :-)
>
> I can't do anything about the Jetty server without upgrading Solr. I really
> want to get them upgraded, but it's not up to me.
>
> I tried to use the legacy SolrJ clients that utilize Apache HttpClient 4.x,
> but for an unknown reason I was not able to get those clients to work. I am
> not using any http client directly, I use SolrJ. Layers upon layers. I am
> completely shielded from any direct interaction with the Jetty client by
> SolrJ.
>
>> From faster result on smaller batch size: are you monitoring memory use? I'd
>> try even smaller, looking at the perf profile for clues.
>
> The queries gathering documents from the source are faster with a filter in
> place, so they feed data to the queue faster. I think this is probably
> because it is only sorting a few million documents for each document batch
> instead of the full 30 million.
>
> I was running my program with a 1GB heap. With a queue size of 100000 or
> 150000, that worked well.
>
> I later bumped the queue size to 200000 and had to bump the heap because I
> got OOME. The space is consumed by the SolrInputDocument objects on the
> queue. I set the heap to 2GB for the 200K queue size. Now the max queue
> size is 500K and the heap is 5GB. A larger queue evens out the transfer of
> data from the query thread to the indexing threads and keeps the migration
> from stalling.
>
> I'm using ZGC to optimize for latency. The code is compiled for Java 11.
>
> Thanks,
> Shawn
> _______________________________________________
> jetty-users mailing list
> jetty-users@eclipse.org
> To unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
jetty-users@eclipse.org
To unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/jetty-users