Github user jaltekruse commented on a diff in the pull request:
https://github.com/apache/drill/pull/381#discussion_r54018324
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinProbeTemplate.java
---
@@ -98,7 +102,9 @@ public void setupHashJoinProbe(FragmentContext context,
VectorContainer buildBat
}
public void executeProjectRightPhase() {
- while (outputRecords < TARGET_RECORDS_PER_BATCH && recordsProcessed <
recordsToProcess) {
+ while (outputRecords < targetRecordsPerBatch
+ && recordsProcessed < recordsToProcess
+ && (!adjustTargetRecordsPerBatch ||
outgoingJoinBatch.getMemoryUsed() < TARGET_BATCH_SIZE_IN_BYTES)) {
--- End diff --
It seems like the thing we are testing for here isn't actually directly
related to the condition we are trying to avoid. The overall memory consumed
when outputting records will be a function of both size of values as well as
number of columns. I think this is a reasonable approach for now but we should
open a follow-up JIRA to look at where things will break as we encounter cases
where there are many wide columns in a dataset.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---