Quanlong Huang created IMPALA-10916:
---------------------------------------
Summary: Pushdown Stats predicates with different argument types
to the ORC reader
Key: IMPALA-10916
URL: https://issues.apache.org/jira/browse/IMPALA-10916
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang
IMPALA-6505 introduces min-max predicate (stats predicate) pushdown to the ORC
reader. However, stats predicates with different argument types may not be
pushed down. For example, for the query
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > 132842;
{code}
the slot type of "d1" is DECIMAL(9,0), while the type of literal "132842" is
INT.
FE generates a stats predicate "d1 > 132842". After analyze, it becomes
"CAST(d1 as INT) > CAST(132842 as INT)". The rhs is Literal(value=132842
type=INT) in the BE, which is good. But the lhs is no longer a single slot ref,
but a expression. So we cannot simply push it down.
The following queries have the same issue:
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > cast(132842.0 as
float);
select count(*) from functional_orc_def.decimal_tbl where d1 > cast(132842.0 as
double);
{code}
Currently, a workaround is avoid comparing slot ref with other types. For
decimal, this query works:
{code:sql}
select count(*) from functional_orc_def.decimal_tbl where d1 > 132842.0;
{code}
FE will analyze "132842.0" to be DECIMAL(10,1), so we are good.
A possible solution is detecting such simple CAST expressions, and generating
orc::PredicateDataType using the slot type instead of the literal type. We will
need handcraft casting logics when generating the orc::Literal, e.g. converting
a impala::Literal(value=132842.0 type=FLOAT) to orc::Literal in DECIMAL type.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)