icyjhl created SPARK-40610:
------------------------------
Summary: Spark fall back to use getPartitions instead of
getPartitionsByFilter when date_add functions used in where clause
Key: SPARK-40610
URL: https://issues.apache.org/jira/browse/SPARK-40610
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.2.1
Environment: edw.tmp_test_metastore_usage_source is a big table with
1000 partitions and hundreds of columns
Reporter: icyjhl
When I run a insert overwrite statement, I got error saying:
{code:java}
MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s.
listPartitions {code}
It's weird as I only selected for about 3 partitions, so I rerun the sql and
checked the metastore, then I found it's fetching all columns in all partitions:
{code:java}
select "CD_ID", "COMMENT", "COLUMN_NAME", "TYPE_NAME" from "COLUMNS_V2" where
"CD_ID"
in
(675384,675393,675385,675394,675396,675397,675395,675398,675399,675401,675402,675400,675406……){code}
After testing, I found the problem is with the date_add function in where
clause, if remove it ,sql works fine, else metastore would fetch all columns in
all partitions.
{code:java}
insert overwrite table test.tmp_test_metastore_usage
SELECT userid
,SUBSTR(sendtime,1,10) AS creation_date
,cast(json_bh_esdate_deltadays_max as DECIMAL(38,2)) AS
bh_esdate_deltadays_max
,json_bh_qiye_industryphyname AS bh_qiye_industryphyname
,cast(json_bh_esdate_deltadays_min as DECIMAL(38,2)) AS
bh_esdate_deltadays_min
,cast(json_bh_subconam_min as DECIMAL(38,2)) AS bh_subconam_min
,cast(json_bh_qiye_regcap_min as DECIMAL(38,2)) AS bh_qiye_regcap_min
,json_bh_industryphyname AS bh_industryphyname
,cast(json_bh_subconam_mean as DECIMAL(38,2)) AS bh_subconam_mean
,cast(json_bh_industryphyname_nunique as DECIMAL(38,2)) AS
bh_industryphyname_nunique
,cast(current_timestamp() as string) as dw_cre_date
,cast(current_timestamp() as string) as dw_upd_date
FROM (
SELECT userid
,sendtime
,json_bh_esdate_deltadays_max
,json_bh_qiye_industryphyname
,json_bh_esdate_deltadays_min
,json_bh_subconam_min
,json_bh_qiye_regcap_min
,json_bh_industryphyname
,json_bh_subconam_mean
,json_bh_industryphyname_nunique
,row_number() OVER (
PARTITION BY userid,dt ORDER BY sendtime DESC
) rn
FROM edw.tmp_test_metastore_usage_source
WHERE dt >= date_add('2022-09-22',-3 )
AND json_bizid IN ('6101')
AND json_dingid IN ('611')
) t
WHERE rn = 1 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]