[jira] [Comment Edited] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption

Joel Bernstein (Jira) Tue, 03 Sep 2024 05:43:05 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878869#comment-17878869
 ]


Joel Bernstein edited comment on SOLR-17430 at 9/3/24 12:42 PM:
----------------------------------------------------------------

One thought on the design I have is to use a blocking queue with a global 
thread pool of output threads waiting on the queue. In this situation the 
output/writer threads block indefinitely. The reader thread, which is the main 
execution thread for the query, simply pushes batches of materialized docs to 
the queue. 


was (Author: joel.bernstein):
One thought on the design I have is to use a blocking queue with a global 
thread pool of output threads waiting on the queue. In this situation the 
output/writer threads block indefinitely. The reader thread, which is the main 
execution thread for the query simply pushes batches of materialized docs to 
the queue. 

> Redesign ExportWriter / ExportBuffers to work better with large batchSizes 
> and slow consumption
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-17430
>                 URL: https://issues.apache.org/jira/browse/SOLR-17430
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by 
> the {{ExportHandler}} is brittle and the absolutely time limit on how long 
> the buffer swapping threads will wait for eachother isn't suitable for very 
> long running streaming expressions...
> {quote}The problem however is that this 600 second timeout may not be enough 
> to account for really slow downstream consumption of the data.  With really 
> large collections, and really complicated streaming expressions, this can 
> happen even when well behaved clients that are actively trying to consume 
> data.
> {quote}
> ...but another sub-optimal aspect of this buffer swapping design is that the 
> "writer" thread is initially completely blocked, and can't write out a single 
> document, until the "filler" thread has read the full {{batchSize}} of 
> documents into it's buffer and opted to swap.  Likewise, after buffer 
> swapping has occured at least once, any document in the {{outputBuffer}} that 
> the writer has already processed hangs around, taking up ram, until the next 
> swap, while one of the threads is idle.  If {{{}batchSize=30000{}}}, and the 
> "filler" thread is ready to go with a full {{fillBuffer}} while the "writer" 
> has only been able to emit 29999 of the documents in it's {{outputBuffer}} 
> documents before being blocked and forced to wait (due to the downstream 
> consumer of the output bytes) before it can emit the last document in it's 
> batch – that means both the "writer" thread and the "filler" thread are 
> stalled, taking up 2x the batchSize of ram, even though half of that is data 
> that is no longer needed.
> The bigger the {{batchSize}} the worse the initial delay (and steady state 
> wasted RAM) is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption

Reply via email to