I usually look for low-effort changes to test assumptions and tug on the
tangle from different directions when I'm stuck. In that spirit: 

> The queries gathering documents from the source are faster with a filter in 
> place, so they feed data to the queue faster.  I think this is probably 
> because it is only sorting a few million documents for each document batch 
> instead of the full 30 million.

If you can turn off the sorting for a test, that would prove the
assumption.  

I just looked up what SolrJ does - no insights, but now I'm curious how
well the parallel indexing performs compared to postgres'
single-threaded index builds. I wonder if it could speed up your bigger
query's sort time. 

Bill 

--

Phobrain.com 

On 2023-05-31 19:58, Shawn Heisey wrote:

> On 5/31/23 17:48, Bill Ross wrote: 
> 
>> Can you swap in another httpclient to test? I assume swapping jetty server 
>> would be too much, given something works. :-)
> 
> I can't do anything about the Jetty server without upgrading Solr.  I really 
> want to get them upgraded, but it's not up to me.
> 
> I tried to use the legacy SolrJ clients that utilize Apache HttpClient 4.x, 
> but for an unknown reason I was not able to get those clients to work.  I am 
> not using any http client directly, I use SolrJ.  Layers upon layers.  I am 
> completely shielded from any direct interaction with the Jetty client by 
> SolrJ.
> 
>> From faster result on smaller batch size: are you monitoring memory use? I'd 
>> try even smaller, looking at the perf profile for clues.
> 
> The queries gathering documents from the source are faster with a filter in 
> place, so they feed data to the queue faster.  I think this is probably 
> because it is only sorting a few million documents for each document batch 
> instead of the full 30 million.
> 
> I was running my program with a 1GB heap.  With a queue size of 100000 or 
> 150000, that worked well.
> 
> I later bumped the queue size to 200000 and had to bump the heap because I 
> got OOME.  The space is consumed by the SolrInputDocument objects on the 
> queue.  I set the heap to 2GB for the 200K queue size.  Now the max queue 
> size is 500K and the heap is 5GB.  A larger queue evens out the transfer of 
> data from the query thread to the indexing threads and keeps the migration 
> from stalling.
> 
> I'm using ZGC to optimize for latency.  The code is compiled for Java 11.
> 
> Thanks,
> Shawn
> _______________________________________________
> jetty-users mailing list
> jetty-users@eclipse.org
> To unsubscribe from this list, visit 
> https://www.eclipse.org/mailman/listinfo/jetty-users
_______________________________________________
jetty-users mailing list
jetty-users@eclipse.org
To unsubscribe from this list, visit 
https://www.eclipse.org/mailman/listinfo/jetty-users

Reply via email to