[
https://issues.apache.org/jira/browse/SOLR-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878869#comment-17878869
]
Joel Bernstein edited comment on SOLR-17430 at 9/3/24 12:42 PM:
----------------------------------------------------------------
One thought on the design I have is to use a blocking queue with a global
thread pool of output threads waiting on the queue. In this situation the
output/writer threads block indefinitely. The reader thread, which is the main
execution thread for the query, simply pushes batches of materialized docs to
the queue.
was (Author: joel.bernstein):
One thought on the design I have is to use a blocking queue with a global
thread pool of output threads waiting on the queue. In this situation the
output/writer threads block indefinitely. The reader thread, which is the main
execution thread for the query simply pushes batches of materialized docs to
the queue.
> Redesign ExportWriter / ExportBuffers to work better with large batchSizes
> and slow consumption
> -----------------------------------------------------------------------------------------------
>
> Key: SOLR-17430
> URL: https://issues.apache.org/jira/browse/SOLR-17430
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Priority: Major
>
> As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by
> the {{ExportHandler}} is brittle and the absolutely time limit on how long
> the buffer swapping threads will wait for eachother isn't suitable for very
> long running streaming expressions...
> {quote}The problem however is that this 600 second timeout may not be enough
> to account for really slow downstream consumption of the data. With really
> large collections, and really complicated streaming expressions, this can
> happen even when well behaved clients that are actively trying to consume
> data.
> {quote}
> ...but another sub-optimal aspect of this buffer swapping design is that the
> "writer" thread is initially completely blocked, and can't write out a single
> document, until the "filler" thread has read the full {{batchSize}} of
> documents into it's buffer and opted to swap. Likewise, after buffer
> swapping has occured at least once, any document in the {{outputBuffer}} that
> the writer has already processed hangs around, taking up ram, until the next
> swap, while one of the threads is idle. If {{{}batchSize=30000{}}}, and the
> "filler" thread is ready to go with a full {{fillBuffer}} while the "writer"
> has only been able to emit 29999 of the documents in it's {{outputBuffer}}
> documents before being blocked and forced to wait (due to the downstream
> consumer of the output bytes) before it can emit the last document in it's
> batch – that means both the "writer" thread and the "filler" thread are
> stalled, taking up 2x the batchSize of ram, even though half of that is data
> that is no longer needed.
> The bigger the {{batchSize}} the worse the initial delay (and steady state
> wasted RAM) is.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]