[
https://issues.apache.org/jira/browse/SPARK-40610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614323#comment-17614323
]
icyjhl commented on SPARK-40610:
--------------------------------
Hi [~Zing], this works fine in spark2 and spark 3.0.1, anything we changed in
3.2.1?
> Spark fall back to use getPartitions instead of getPartitionsByFilter when
> date_add functions used in where clause
> -------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-40610
> URL: https://issues.apache.org/jira/browse/SPARK-40610
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.1
> Environment: edw.tmp_test_metastore_usage_source is a big table with
> 1000 partitions and hundreds of columns
> Reporter: icyjhl
> Priority: Major
> Attachments: spark_error.log, spark_sql.sql, sql_in_mysql.sql
>
>
> When I run a insert overwrite statement, I got error saying:
>
> {code:java}
> MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s.
> listPartitions {code}
>
> It's weird as I only selected for about 3 partitions, so I rerun the sql and
> checked the metastore, then I found it's fetching all columns in all
> partitions:
>
> {code:java}
> select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where
> "CD_ID"
> in
> (675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
>
>
> After testing, I found the problem is with the date_add function in where
> clause, if remove it ,sql works fine, else metastore would fetch all columns
> in all partitions.
>
>
> {code:java}
> insert overwrite table test.tmp_test_metastore_usage
> SELECT userid
> ,SUBSTR(sendtime,1,10) AS creation_date
> ,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS
> bh_esdate_deltadays_max
> ,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
> ,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS
> bh_esdate_deltadays_min
> ,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
> ,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
> ,json_bh_industryphyname AS bh_industryphyname
> ,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
> ,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS
> bh_industryphyname_nunique
> ,cast(current_timestamp() as string) as dw_cre_date
> ,cast(current_timestamp() as string) as dw_upd_date
> FROM (
> SELECT userid
> ,sendtime
> ,json_bh_esdate_deltadays_max
> ,json_bh_qiye_industryphyname
> ,json_bh_esdate_deltadays_min
> ,json_bh_subconam_min
> ,json_bh_qiye_regcap_min
> ,json_bh_industryphyname
> ,json_bh_subconam_mean
> ,json_bh_industryphyname_nunique
> ,row_number() OVER (
> PARTITION BY userid,dt ORDER BY sendtime DESC
> ) rn
> FROM edw.tmp_test_metastore_usage_source
> WHERE dt >= date_add('2022-09-22',-3 )
> AND json_bizid IN ('6101')
> AND json_dingid IN ('611')
> ) t
> WHERE rn = 1 {code}
>
> By the way 2.4.7 works good.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]