@sachouche @vvysotskyi I don't agree this should be handled by the column sizes 
map. The issue is that operators are expecting a column with the name of 
MYCOLUMN (because that is the name provided by the planner), but instead the 
input column has a name of `` `MYCOLUMN` ``. This can cause errors at many 
points in an operator's execution, not just within the RecordBatchSizer's 
columnSizes map. For example, in HashJoin the HashTable uses the unquoted 
column names provided by the planner to retrieve the key column from the 
incoming record batch (See ChainedHashTable.createAndSetupHashTable). So while 
this fix resolves a fatal exception in the batch sizer, it does not address the 
issue of functional correctness in other parts of the code like the HashTable 
which may be silently generating incorrect results.

If we close this issue now with a temporary fix, some poor soul may spend weeks 
debugging strange and unexpected data correctness issues down the line. In 
order to avoid that scenario and to increase the urgency of fixing the root 
cause, I am actually thinking that we should leave the bug unfixed until we 
have a permanent fix for the parquet reader. What are your guys thoughts?

[ Full content available at: https://github.com/apache/drill/pull/1445 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to