@sachouche @vvysotskyi I don't agree this should be handled by the column sizes 
map. The issue is that operators are expecting a column with the name of 
MYCOLUMN (because that is the name provided by the planner), but instead the 
input column has a name of `MYCOLUMN` . This can cause errors at many points in 
an operator's execution, not just within the RecordBatchSizer's columnSizes 
map. For example, in HashJoin the HashTable uses the unquoted column names 
provided by the planner to retrieve the key column from the incoming record 
batch (See ChainedHashTable.createAndSetupHashTable). So while this fix 
resolves a fatal exception in the batch sizer, it does not address the issue of 
functional correctness in other parts of the code like the HashTable which may 
be silently generating incorrect results.

If we close this issue now with a temporary fix, some poor soul may spend weeks 
debugging strange and unexpected data correctness issues down the line. In 
order to avoid that scenario and to increase the urgency of fixing the root 
cause, I am actually thinking that we should leave the bug unfixed until we 
have a permanent fix for the parquet reader. What are your guys thoughts?

[ Full content available at: https://github.com/apache/drill/pull/1445 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to