paul-rogers commented on a change in pull request #2000: DRILL-7607: support 
dynamic credit based flow control
URL: https://github.com/apache/drill/pull/2000#discussion_r385998974
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/UnlimitedRawBatchBuffer.java
 ##########
 @@ -90,14 +100,39 @@ public boolean isEmpty() {
 
     @Override
     public void add(RawFragmentBatch batch) {
+      int recordCount = batch.getHeader().getDef().getRecordCount();
+      long bathByteSize = batch.getByteCount();
+      if (recordCount != 0) {
+        //skip first header batch
+        totalBatchSize += bathByteSize;
+        sampleTimes++;
+      }
+      if (sampleTimes == maxSampleTimes) {
+        long averageBathSize = totalBatchSize / sampleTimes;
+        //make a decision
+        long limit = context.getAllocator().getLimit();
 
 Review comment:
   Another issue is the question of how many of these receivers exist per 
Drillbit. I don't know the answer. If I have 5 minor fragments on this 
Drillbit, will all 5 have their own flow control calcs? Will I have 5 fragments 
each trying to use 50% of 10GB for a total of 20GB of buffering? Will this be a 
problem?
   
   Also, how can this algorithm go wrong? Suppose I have a set of files 
organized by time. I do a time range query. The first few batches might have 
very few rows because the filter is picking up just a few early arrivals. We 
see three batches, say, where the filter had low selectivity, of a few dozen 
rows, then decide we can hold many batches.
   
   Later, the scan hits the bulk of my time ranges and the batches have far 
fewer rows filtered out. Suddenly, we need far more memory for these 
low-selectivity batches.
   
   Do we need a safety valve that says that we will back off if we suddenly see 
large batches?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to