[
https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15164491#comment-15164491
]
ASF GitHub Bot commented on DRILL-4411:
---------------------------------------
Github user jaltekruse commented on a diff in the pull request:
https://github.com/apache/drill/pull/381#discussion_r54018324
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
---
@@ -98,7 +102,9 @@ public void setupHashJoinProbe(FragmentContext context,
VectorContainer buildBat
}
public void executeProjectRightPhase() {
- while (outputRecords < TARGET_RECORDS_PER_BATCH && recordsProcessed <
recordsToProcess) {
+ while (outputRecords < targetRecordsPerBatch
+ && recordsProcessed < recordsToProcess
+ && (!adjustTargetRecordsPerBatch ||
outgoingJoinBatch.getMemoryUsed() < TARGET_BATCH_SIZE_IN_BYTES)) {
--- End diff --
It seems like the thing we are testing for here isn't actually directly
related to the condition we are trying to avoid. The overall memory consumed
when outputting records will be a function of both size of values as well as
number of columns. I think this is a reasonable approach for now but we should
open a follow-up JIRA to look at where things will break as we encounter cases
where there are many wide columns in a dataset.
> HashJoin should not only depend on number of records, but also on size
> ----------------------------------------------------------------------
>
> Key: DRILL-4411
> URL: https://issues.apache.org/jira/browse/DRILL-4411
> Project: Apache Drill
> Issue Type: Bug
> Components: Server
> Reporter: MinJi Kim
> Assignee: MinJi Kim
>
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH
> (4000). But we should not only depend on the number of records, but also
> size (in case of extremely large records).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)