[
https://issues.apache.org/jira/browse/SPARK-43170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713858#comment-17713858
]
Yuming Wang commented on SPARK-43170:
-------------------------------------
I can't reproduce this issue:
!image-2023-04-19-10-59-44-118.png!
> The spark sql like statement is pushed down to parquet for execution, but the
> data cannot be queried
> ----------------------------------------------------------------------------------------------------
>
> Key: SPARK-43170
> URL: https://issues.apache.org/jira/browse/SPARK-43170
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.2
> Reporter: todd
> Priority: Major
> Attachments: image-2023-04-18-10-59-30-199.png,
> image-2023-04-19-10-59-44-118.png
>
>
> --DDL
> CREATE TABLE `ecom_dwm`.`dwm_user_app_action_sum_all` (
> `gaid` STRING COMMENT '',
> `beyla_id` STRING COMMENT '',
> `dt` STRING,
> `hour` STRING,
> `appid` STRING COMMENT '包名')
> USING parquet
> PARTITIONED BY (dt, hour, appid)
> LOCATION 's3://xxxxx/dwm_user_app_action_sum_all'
> – partitions info
> show partitions ecom_dwm.dwm_user_app_action_sum_all PARTITION
> (dt='20230412');
>
> dt=20230412/hour=23/appid=blibli.mobile.commerce
> dt=20230412/hour=23/appid=cn.shopee.app
> dt=20230412/hour=23/appid=cn.shopee.br
> dt=20230412/hour=23/appid=cn.shopee.id
> dt=20230412/hour=23/appid=cn.shopee.my
> dt=20230412/hour=23/appid=cn.shopee.ph
>
> — query
> select DISTINCT(appid) from ecom_dwm.dwm_user_app_action_sum_all
> where dt='20230412' and appid like '%shopee%'
>
> --result
> nodata
>
> — other
> I use spark3.0.1 version and trino query engine to query the data。
>
>
> The physical execution node formed by spark 3.2
> (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all Output [3]: [dt#63,
> hour#64, appid#65|#63, hour#64, appid#65] Batched: true Location:
> InMemoryFileIndex []
> PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412),
> Contains(appid#65, shopee)|#63), isnotnull(appid#65), (dt#63 = 20230412),
> Contains(appid#65, shopee)] ReadSchema: struct<>
>
>
> !image-2023-04-18-10-59-30-199.png!
>
> – sql plan detail
> {code:java}
> == Physical Plan ==
> CollectLimit (9)
> +- InMemoryTableScan (1)
> +- InMemoryRelation (2)
> +- * HashAggregate (8)
> +- Exchange (7)
> +- * HashAggregate (6)
> +- * Project (5)
> +- * ColumnarToRow (4)
> +- Scan parquet
> ecom_dwm.dwm_user_app_action_sum_all (3)
> (1) InMemoryTableScan
> Output [1]: [appid#65]
> Arguments: [appid#65]
> (2) InMemoryRelation
> Arguments: [appid#65],
> CachedRDDBuilder(org.apache.spark.sql.execution.columnar.DefaultCachedBatchSerializer@ab5af13,StorageLevel(disk,
> memory, deserialized, 1 replicas),*(2) HashAggregate(keys=[appid#65],
> functions=[], output=[appid#65])
> +- Exchange hashpartitioning(appid#65, 200), ENSURE_REQUIREMENTS, [plan_id=24]
> +- *(1) HashAggregate(keys=[appid#65], functions=[], output=[appid#65])
> +- *(1) Project [appid#65]
> +- *(1) ColumnarToRow
> +- FileScan parquet
> ecom_dwm.dwm_user_app_action_sum_all[dt#63,hour#64,appid#65] Batched: true,
> DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[],
> PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412),
> StartsWith(appid#65, com)], PushedFilters: [], ReadSchema: struct<>
> ,None)
> (3) Scan parquet ecom_dwm.dwm_user_app_action_sum_all
> Output [3]: [dt#63, hour#64, appid#65]
> Batched: true
> Location: InMemoryFileIndex []
> PartitionFilters: [isnotnull(dt#63), isnotnull(appid#65), (dt#63 = 20230412),
> StartsWith(appid#65, com)]
> ReadSchema: struct<>
> (4) ColumnarToRow [codegen id : 1]
> Input [3]: [dt#63, hour#64, appid#65]
> (5) Project [codegen id : 1]
> Output [1]: [appid#65]
> Input [3]: [dt#63, hour#64, appid#65]
> (6) HashAggregate [codegen id : 1]
> Input [1]: [appid#65]
> Keys [1]: [appid#65]
> Functions: []
> Aggregate Attributes: []
> Results [1]: [appid#65]
> (7) Exchange
> Input [1]: [appid#65]
> Arguments: hashpartitioning(appid#65, 200), ENSURE_REQUIREMENTS, [plan_id=24]
> (8) HashAggregate [codegen id : 2]
> Input [1]: [appid#65]
> Keys [1]: [appid#65]
> Functions: []
> Aggregate Attributes: []
> Results [1]: [appid#65]
> (9) CollectLimit
> Input [1]: [appid#65]
> Arguments: 1 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]