[
https://issues.apache.org/jira/browse/DRILL-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623866#comment-16623866
]
ASF GitHub Bot commented on DRILL-6747:
---------------------------------------
ilooner commented on a change in pull request #1472: DRILL-6747: Fixed IOB in
HashJoin for inner and right joins on empty probe side
URL: https://github.com/apache/drill/pull/1472#discussion_r219561941
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java
##########
@@ -115,6 +115,14 @@
* This value will be returned only after {@link #OK_NEW_SCHEMA} has been
* returned at least once (not necessarily <em>immediately</em> after).
* </p>
+ * <p>
+ * Also after a RecordBatch returns NONE a RecordBatch should:
+ * <ul>
+ * <li>Contain the last valid schema seen by the operator.</li>
+ * <li>Contain a VectorContainer with empty columns corresponding to
the last valid schema.</li>
+ * <li>Return a record count of 0.</li>
+ * </ul>
+ * </p>
*/
NONE(false),
Review comment:
HashJoin needs to first sniff the first data holding batch from the probe
side in order to compute stats for the memory calculations used to spill. If
the probe side is empty we will hit NONE. This is unavoidable and this is where
the problem begins.
In the case of a right join HashJoin still builds HashTables, and the
HashTables require build and probe batches to be updated simultaneously before
being built. When the build and probe sides are updated the Hashtable expects a
vector container with valid columns. If the columns are not present an IOB is
thrown.
Then later the probing logic is called in HashJoinProbeTemplate, which
expects the upstream probeBatch to return a valid record count.
There are a few ways to fix this problem:
1. We can refactor the HashTable to update probe and build sides
independently. This will have some corner cases though, and the HashTable has
no unit tests, so it's risky. Or we can remove the code generation from the
HashTable, which will also be somewhat time consuming. Or we could change the
join logic to not build the HashTable in this case, which is another major
change. Then we can also refactor HashJoinProbeTemplate to handle an empty
probe side gracefully, but again HashJoinProbeTemplate is not unit tested so
this will be time consuming.
2. We leave the HashTable and HashJoinProbeTemplate as is, but use the
last schema we got from the probe side to build dummy empty vector containers
for building the HashTable, and for probing in HashJoinProbeTemplate. IMO this
is a hack. Also there are some corner cases because some Drill operators
violate the contract that they must first send OK_NEW_SCHEMA and then NONE.
I've observed in some unit tests some readers directly skip to sending NONE
without OK_NEW_SCHEMA if there is no data.
3. We define the state of a RecordBatch after it returns NONE to have a
record count of zero, the last valid schema, and an empty VectorContainer. This
would be defined should a downstream operator choose to query it. This resolves
the issue in HashJoin without doing hacks and risky refactoring in a rush. It
also defines state that was previously undefined, which IMO is a good thing. I
would also do follow up PRs to make all operators obey this behavior, since it
will be trivial to do.
- Of the three options, option 1 is the strictly correct thing to do if
you don't consider time costs. However, due to the technical debt we've
accumulated it will be time consuming to do.
- Option 2 is a hack IMO and just makes things more confusing for the
future generation of Drillers.
- Option 3 is a win win. We get a clean resolution to the issue, and we
define the state of operators after they return NONE. Defining state that was
previously undefined is always a good thing, especially when it simplifies the
logic of the system. And I can go through the operators and make sure they do
this, it should be a trivial change involving zeroing VectorContainers before
returning NONE and updating record counts to be 0.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Empty Probe Side in Right or Inner Join Causes IOB.
> ---------------------------------------------------
>
> Key: DRILL-6747
> URL: https://issues.apache.org/jira/browse/DRILL-6747
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Timothy Farkas
> Assignee: Timothy Farkas
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> When a right join is done with an empty probe side, which first returns an
> OK_NEW_SCHEMA, and then NONE, an IOB would be triggered.
> This happens in the following scenario
> 1. We finish reading the build side.
> 2. The upstream probe side operator is an UnorderReciever.
> 3. We sniff for probe data, and find none, so the UnorderedReciever returns
> NONE.
> 4. When the UnorderedReciever returns NONE it clears the VectorContainer in
> its RecordBatchLoader causing all the value vectors to be deleted.
> 5. We then try to build hashtables from the build side data.
> 6. The HashTable requires a probe side vector container from an
> UnorderedReciever.
> 7. The UnorderedReciever's vector container is in RecordBatchLoader, which
> was cleared when NONE was returned. So all the vector containers columns were
> removed.
> 8. Building the hashtable attempts to access the vector container's columns,
> and we get an IOB.
> There are a couple ways to fix the issue.
> Currently updating value vectors in the hashtable for build and probe sides
> are tightly coupled, you cannot update them independently. So we could
> refactor updating the build and probe sides independently in the case of an
> empty probe side. This would be an invasive change though.
> Another solution which I am going to implement here is to add a requirement
> to the contract for IteroutCome.NONE that the VectorContainer columns are
> preserved, and fix UnorderedReciever to obey that contract. It looks like all
> the other operators already do this, but it is not explicity tested. I will
> create other jiras to explicitely test this behavior. Adding this requirement
> to the contract for Iteroutcome.NONE has the benefit of making the state of a
> RecordBatch well defined after NONE, which it is currently not.
> {code}
> Fragment 3:3
> [Error Id: 061e7d71-e3c1-43bd-a6d5-9d588ecf6551 on perf109-52.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IndexOutOfBoundsException: Index: 1, Size: 0
> Fragment 3:3
> [Error Id: 061e7d71-e3c1-43bd-a6d5-9d588ecf6551 on perf109-52.qa.lab:31010]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
> ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:360)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:215)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:326)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_171]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[na:1.8.0_171]
> at java.util.ArrayList.get(ArrayList.java:433) ~[na:1.8.0_171]
> at
> org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:317)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:251)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:139)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.HashTableGen783.doSetup(HashTableGen783.java:49)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.updateBatches(HashTableTemplate.java:513)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.updateIncoming(HashTableTemplate.java:870)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.common.HashPartition.buildContainersHashTableAndHelper(HashPartition.java:510)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:973)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:436)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:175)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:122)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:112)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:175)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:122)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:112)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:175)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:152)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:293)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:280)
> ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.8.0_171]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_171]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> ~[hadoop-common-2.7.0-mapr-1707.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:280)
> [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
> ... 4 common frames omitted
> 2018-09-18 15:32:59,977 [245e830b-9128-eaa9-44b3-907398183f09:frag:3:43]
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
> IndexOutOfBoundsException: Index: 1, Size: 0
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)