Zoltán Borók-Nagy created IMPALA-15040:
------------------------------------------

             Summary: Predicate pull-up for Iceberg tables
                 Key: IMPALA-15040
                 URL: https://issues.apache.org/jira/browse/IMPALA-15040
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Zoltán Borók-Nagy


Impala uses the following plan to scan Iceberg tables with position deletes / 
Deletion Vectors:
{noformat}
       UNION ALL
      /        \
     /          \
    /            \
   SCAN all  IcebergDeleteNode
   datafiles  /      \
   without   /        \
   deletes  SCAN      SCAN
            datafiles deletes
            with
            deletes
{noformat}
IcebergDeleteNode is highly efficient in eliminating deleted rows. Therefore, 
we could pull up expensive conjuncts from "SCAN datafiles with deletes" to 
IcebergDeleteNode, so we'd only evaluate them on the active rows.

{*}There are some caveats{*}:
 * If the conjuncts are used in late materialization in the SCANs, and have 
high selectivity, then pulling them up might make performance worse
 * We probably shouldn't pull up predicates that can be evaluated by 
min/max/dictionary filters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to