[GitHub] drill pull request: DRILL-4411: hash join should limit batch based...

jaltekruse Wed, 24 Feb 2016 14:30:58 -0800

Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/381#discussion_r54018324
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
 ---
    @@ -98,7 +102,9 @@ public void setupHashJoinProbe(FragmentContext context, 
VectorContainer buildBat
       }
     
       public void executeProjectRightPhase() {
    -    while (outputRecords < TARGET_RECORDS_PER_BATCH && recordsProcessed < 
recordsToProcess) {
    +    while (outputRecords < targetRecordsPerBatch
    +            && recordsProcessed < recordsToProcess
    +            && (!adjustTargetRecordsPerBatch || 
outgoingJoinBatch.getMemoryUsed() < TARGET_BATCH_SIZE_IN_BYTES)) {
    --- End diff --
    
    It seems like the thing we are testing for here isn't actually directly 
related to the condition we are trying to avoid. The overall memory consumed 
when outputting records will be a function of both size of values as well as 
number of columns. I think this is a reasonable approach for now but we should 
open a follow-up JIRA to look at where things will break as we encounter cases 
where there are many wide columns in a dataset.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4411: hash join should limit batch based...

Reply via email to