Chris M. Hostetter created SOLR-17430:
-----------------------------------------
Summary: Redesign ExportWriter / ExportBuffers to work better with
large batchSizes and slow consumption
Key: SOLR-17430
URL: https://issues.apache.org/jira/browse/SOLR-17430
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by
the {{ExportHandler}} is brittle and the absolutely time limit on how long the
buffer swapping threads will wait for eachother isn't suitable for very long
running streaming expressions...
{quote}The problem however is that this 600 second timeout may not be enough to
account for really slow downstream consumption of the data. With really large
collections, and really complicated streaming expressions, this can happen even
when well behaved clients that are actively trying to consume data.
{quote}
...but another sub-optimal aspect of this buffer swapping design is that the
"writer" thread is initially completely blocked, and can't write out a single
document, until the "filler" thread has read the full {{batchSize}} of
documents into it's buffer and opted to swap. Likewise, after buffer swapping
has occured at least once, any document in the {{outputBuffer}} that the writer
has already processed hangs around, taking up ram, until the next swap, while
one of the threads is idle. If {{{}batchSize=30000{}}}, and the "filler"
thread is ready to go with a full {{fillBuffer}} while the "writer" has only
been able to emit 29999 of the documents in it's {{outputBuffer}} documents
before being blocked and forced to wait (due to the downstream consumer of the
output bytes) before it can emit the last document in it's batch – that means
both the "writer" thread and the "filler" thread are stalled, taking up 2x the
batchSize of ram, even though half of that is data that is no longer needed.
The bigger the {{batchSize}} the worse the initial delay (and steady state
wasted RAM) is.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]