[jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size

ASF GitHub Bot (JIRA) Wed, 24 Feb 2016 14:30:54 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15164491#comment-15164491
 ]


ASF GitHub Bot commented on DRILL-4411:
---------------------------------------

Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/381#discussion_r54018324
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
 ---
    @@ -98,7 +102,9 @@ public void setupHashJoinProbe(FragmentContext context, 
VectorContainer buildBat
       }
     
       public void executeProjectRightPhase() {
    -    while (outputRecords < TARGET_RECORDS_PER_BATCH && recordsProcessed < 
recordsToProcess) {
    +    while (outputRecords < targetRecordsPerBatch
    +            && recordsProcessed < recordsToProcess
    +            && (!adjustTargetRecordsPerBatch || 
outgoingJoinBatch.getMemoryUsed() < TARGET_BATCH_SIZE_IN_BYTES)) {
    --- End diff --
    
    It seems like the thing we are testing for here isn't actually directly 
related to the condition we are trying to avoid. The overall memory consumed 
when outputting records will be a function of both size of values as well as 
number of columns. I think this is a reasonable approach for now but we should 
open a follow-up JIRA to look at where things will break as we encounter cases 
where there are many wide columns in a dataset.


> HashJoin should not only depend on number of records, but also on size
> ----------------------------------------------------------------------
>
>                 Key: DRILL-4411
>                 URL: https://issues.apache.org/jira/browse/DRILL-4411
>             Project: Apache Drill
>          Issue Type: Bug
>          Components:  Server
>            Reporter: MinJi Kim
>            Assignee: MinJi Kim
>
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH 
> (4000).  But we should not only depend on the number of records, but also 
> size (in case of extremely large records).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size

Reply via email to