Quanlong Huang created IMPALA-10916:
---------------------------------------

             Summary: Pushdown Stats predicates with different argument types 
to the ORC reader
                 Key: IMPALA-10916
                 URL: https://issues.apache.org/jira/browse/IMPALA-10916
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


IMPALA-6505 introduces min-max predicate (stats predicate) pushdown to the ORC 
reader. However, stats predicates with different argument types may not be 
pushed down. For example, for the query
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > 132842;
{code}
the slot type of "d1" is DECIMAL(9,0), while the type of literal "132842" is 
INT.
 FE generates a stats predicate "d1 > 132842". After analyze, it becomes 
"CAST(d1 as INT) > CAST(132842 as INT)". The rhs is Literal(value=132842 
type=INT) in the BE, which is good. But the lhs is no longer a single slot ref, 
but a expression. So we cannot simply push it down.

The following queries have the same issue:
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > cast(132842.0 as 
float);
select count(*) from functional_orc_def.decimal_tbl where d1 > cast(132842.0 as 
double);
{code}
Currently, a workaround is avoid comparing slot ref with other types. For 
decimal, this query works:
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > 132842.0;
{code}
FE will analyze "132842.0" to be DECIMAL(10,1), so we are good.

A possible solution is detecting such simple CAST expressions, and generating 
orc::PredicateDataType using the slot type instead of the literal type. We will 
need handcraft casting logics when generating the orc::Literal, e.g. converting 
a impala::Literal(value=132842.0 type=FLOAT) to orc::Literal in DECIMAL type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to