paul-rogers commented on a change in pull request #2000: DRILL-7607: support dynamic credit based flow control URL: https://github.com/apache/drill/pull/2000#discussion_r386149418
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java ########## @@ -90,14 +100,39 @@ public boolean isEmpty() { @Override public void add(RawFragmentBatch batch) { + int recordCount = batch.getHeader().getDef().getRecordCount(); + long bathByteSize = batch.getByteCount(); + if (recordCount != 0) { + //skip first header batch + totalBatchSize += bathByteSize; + sampleTimes++; + } + if (sampleTimes == maxSampleTimes) { + long averageBathSize = totalBatchSize / sampleTimes; + //make a decision + long limit = context.getAllocator().getLimit(); Review comment: As it turns out, it is VERY hard to compute the memory used for a batch. The "batch sizer" classes use a mechanism that dives below the level of vectors into the ledgers. See `ValueVector.void collectLedgers(Set<BufferLedger> ledgers)`. This turned out to be the only solution because deserialized data shares buffers, as I recall, and counting buffer sizes gives the wrong answer. (It's been a couple of years since we we added that code, so I can't remember the details.) There are two better solutions. One would be to carry the data size along with each batch. But, even that has the problems described in my earlier note. The best solution would be to have an RM solution set the target batch size. Each operator would work to produce batches of that size. (The scan can do so using the Result Set Loader. Other operators can use the "batch sizer" method.) That way memory calcs are easier not just for this feature, but also for sorts and other buffering operators. Since adding a target batch size is beyond the scope of this PR, I'd suggest adding some config options instead: * Enable/disable this feature * Target memory limit for buffering That way, until we can control batch sizes, we can at least adjust the buffering here to deal with different workloads. It is probably fine to make these config (boot) options rather than system options. Finally, my (limited) understanding is that spooling for the receiver exists, but has never been used. First, it is very complex. Second, it could actually slow query execution because of the need to write to, then read from disk. Third, we'd need a way to manage maximum disk space used for spooling. So, given that the code has never been used (I believe), it is not surprising that it contains bugs. Thanks for finding them. Last I checked (my knowledge may be old), only sort, join and hash agg can spill to disk well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services