[ 
https://issues.apache.org/jira/browse/SPARK-32284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-32284:
-----------------------------------
    Summary: Avoid expanding too many CNF predicates in partition pruning  
(was: Avoid pushing down too many CNF filters for partition pruning)

> Avoid expanding too many CNF predicates in partition pruning
> ------------------------------------------------------------
>
>                 Key: SPARK-32284
>                 URL: https://issues.apache.org/jira/browse/SPARK-32284
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>
> After https://github.com/apache/spark/pull/28805, the pushed down predicates 
> for partition pruning can be very long.
> For example, the following partition filter:
> {code:java}
> (p0 = '1' AND p1 = '1') OR (p0 = '2' AND p1 = '2') OR (p0 = '3' AND p1 = '3') 
> OR (p0 = '4' AND p1 = '4') OR (p0 = '5' AND p1 = '5') OR (p0 = '6' AND p1 = 
> '6') OR (p0 = '7' AND p1 = '7') OR (p0 = '8' AND p1 = '8') OR (p0 = '9' AND 
> p1 = '9') OR (p0 = '10' AND p1 = '10') OR (p0 = '11' AND p1 = '11') OR (p0 = 
> '12' AND p1 = '12') OR (p0 = '13' AND p1 = '13') OR (p0 = '14' AND p1 = '14') 
> OR (p0 = '15' AND p1 = '15') OR (p0 = '16' AND p1 = '16') OR (p0 = '17' AND 
> p1 = '17') OR (p0 = '18' AND p1 = '18') OR (p0 = '19' AND p1 = '19') OR (p0 = 
> '20' AND p1 = '20')
> {code}
> will be converted into a 130K long query in Hive metastore, and there will be 
> error:
> {code:java}
> javax.jdo.JDOException: Exception thrown when executing query : SELECT 
> DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS 
> NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.PART_NAME,A0.PART_ID,A0.PART_NAME
>  AS NUCORDER0 FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = 
> B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = ? 
> AND C0."NAME" = ? AND ((((((A0.PART_NAME LIKE '%/p1=1' ESCAPE '\' ) OR 
> (A0.PART_NAME LIKE '%/p1=2' ESCAPE '\' )) OR (A0.PART_NAME LIKE '%/p1=3' 
> ESCAPE '\' )) OR ((A0.PART_NAME LIKE '%/p1=4' ESCAPE '\' ) O ...
> {code}
> To avoid it:
> 1. We should push down the convertible original queries as they are, instead 
> of converting all predicates into CNF
> 2. We can skip grouping expression so that we can stop the CNF conversion 
> when the predicates becoming too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to