@sohami I agree that guarantee was not provided before. However, I am proposing 
to change the contract in the way described in the comment. Note no data would 
be associated with a record batch after it returns NONE. It would only have a 
VectorContainer with **EMPTY** columns corresponding to the last seen valid 
schema, and return a record count of 0. In practice almost all operators 
already do this (except they don't always zero their container).

The advantage of adding this new guarantee is that downstream operators do not 
have to do book keeping when the upstream can do it for us very easily.

The details of why this was done are in the JIRA. There were many ways to fix 
the issue in HashJoin but this is by far the cleanest, and less risky since 
most of the code in HashJoin has no unit tests. It also has the advantage of 
clarifying the behavior of RecordBatches for future operator implementations. 
So I felt this approach was a win win and the best way to go about fixing the 
issue. However, there should be a follow up Jira to validate the new contract I 
am proposing for the rest of the operators, not just unordered reciever.

 

[ Full content available at: https://github.com/apache/drill/pull/1472 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to