paul-rogers commented on a change in pull request #2000: DRILL-7607: support dynamic credit based flow control URL: https://github.com/apache/drill/pull/2000#discussion_r385998974
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java ########## @@ -90,14 +100,39 @@ public boolean isEmpty() { @Override public void add(RawFragmentBatch batch) { + int recordCount = batch.getHeader().getDef().getRecordCount(); + long bathByteSize = batch.getByteCount(); + if (recordCount != 0) { + //skip first header batch + totalBatchSize += bathByteSize; + sampleTimes++; + } + if (sampleTimes == maxSampleTimes) { + long averageBathSize = totalBatchSize / sampleTimes; + //make a decision + long limit = context.getAllocator().getLimit(); Review comment: Another issue is the question of how many of these receivers exist per Drillbit. I don't know the answer. If I have 5 minor fragments on this Drillbit, will all 5 have their own flow control calcs? Will I have 5 fragments each trying to use 50% of 10GB for a total of 20GB of buffering? Will this be a problem? Also, how can this algorithm go wrong? Suppose I have a set of files organized by time. I do a time range query. The first few batches might have very few rows because the filter is picking up just a few early arrivals. We see three batches, say, where the filter had low selectivity, of a few dozen rows, then decide we can hold many batches. Later, the scan hits the bulk of my time ranges and the batches have far fewer rows filtered out. Suddenly, we need far more memory for these low-selectivity batches. Do we need a safety valve that says that we will back off if we suddenly see large batches? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services