[ 
https://issues.apache.org/jira/browse/IMPALA-14737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064359#comment-18064359
 ] 

ASF subversion and git services commented on IMPALA-14737:
----------------------------------------------------------

Commit c601f44281805e421d2ce401729a703e5b16345b in impala's branch 
refs/heads/master from Arnab Karmakar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c601f4428 ]

IMPALA-14737 Part2: Add relaxed predicate pushdown for LIKE patterns with suffix

Part 1 (committed as 540a3784e) added basic LIKE predicate pushdown
to Iceberg for simple prefix patterns ('abc%'), exact matches ('exact'),
and escaped wildcards ('asd\%'). Patterns with literal content after
wildcards (e.g., 'prefix%suffix', 'd%d') were rejected and not pushed
down at all.

This Part 2 patch enhances the implementation with "relaxed predicate
pushdown" for patterns with suffix. Instead of rejecting these patterns
completely, we now:

1. Push down a relaxed prefix predicate to Iceberg (e.g., startsWith(
   'prefix')) for partition/file pruning
2. Retain the full LIKE predicate (e.g., LIKE 'prefix%suffix') in the
   scan node for Impala to evaluate on the surviving rows

Additionally, this patch adds support for simple LIKE patterns in DROP
PARTITION and SHOW FILES operations. Previously, any LIKE predicate would
fail in these operations. Now:
- Simple prefix patterns (e.g., 's LIKE d%') work correctly
- Patterns with suffix are rejected with clear error messages to prevent
  unintended data loss (e.g., DROP PARTITION with 'd%d' would incorrectly
  drop all partitions starting with 'd')

This provides significant performance benefits by leveraging Iceberg's
metadata filtering while maintaining query correctness.

Example behavior for `SELECT ... WHERE s LIKE 'd%d'`:
- Before: Pattern rejected, all 3/3 partitions scanned, no pruning
  benefit
- After: startsWith('d') pushed to Iceberg -> 1/3 partitions, full LIKE
  'd%d' evaluated by Impala on surviving rows -> correct results

Testing:
- Updated iceberg-like-pushdown.test with relaxed predicate tests
- Updated DROP PARTITION tests to include relaxed predicate tests
- Updated SHOW FILES tests to include relaxed predicate tests

Change-Id: I97c11362f098507fa440eafde3c35bbc6d7092b3
Reviewed-on: http://gerrit.cloudera.org:8080/24045
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Zoltan Borok-Nagy <[email protected]>


> Pushdown LIKE predicates to Iceberg
> -----------------------------------
>
>                 Key: IMPALA-14737
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14737
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Arnab Karmakar
>            Priority: Major
>              Labels: impala-iceberg, ramp-up
>
> Iceberg's 
> [https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/Expressions.java]
>  supports more possibilities than what we currently use.
> The most important one is probably
>  * startsWith()
> I.e., when we have the following predicate: {{string_col LIKE 'asdf%xyz'}}
> We should push down:
>  * startsWith("string_col", "asdf")
> I.e., the non-wildcard prefix of the string.
> It should work for UTF-8 strings as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to