[
https://issues.apache.org/jira/browse/SPARK-43357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800846#comment-17800846
]
Nandini edited comment on SPARK-43357 at 12/27/23 6:31 PM:
-----------------------------------------------------------
Hello,
After this fix, we are facing the issue during partition pruning
Repro - table_t1 is partitioned by created_date.
{code:java}
df = spark.sql("select session_id from table_t1 where created_date between
'2023-12-01' and '2023-12-02'")
display(df)
{code}
{code:java}
Caused by: InvalidObjectException(message:Unknown type : 'DATE' (Service:
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID:
081a04e1-c4bb-4a4d-b4aa-84db6db14a64; Proxy: null))
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter$5.get(CatalogToHiveConverter.java:58)
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.getHiveException(CatalogToHiveConverter.java:97)
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.wrapInHiveException(CatalogToHiveConverter.java:88)
at
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getCatalogPartitions(GlueMetastoreClientDelegate.java:838)
at
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getPartitions(GlueMetastoreClientDelegate.java:823)
at
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.listPartitionsByFilter(AWSCatalogMetastoreClient.java:1171)
at
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:1161)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:950)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:348)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:247)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:285)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:239)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:328)
at
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:946)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$getPartitionsByFilter$1(PoolingHiveClient.scala:474)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:149)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.getPartitionsByFilter(PoolingHiveClient.scala:473)
at
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1620)
at
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:155)
... 87 more
{code}
was (Author: nandininelson):
Hi Team,
After this fix, we are facing the issue during partition pruning
Repro - table_t1 is partitioned by created_date.
{code:java}
df = spark.sql("select session_id from table_t1 where created_date between
'2023-12-01' and '2023-12-02'")
display(df)
{code}
{code:java}
Caused by: InvalidObjectException(message:Unknown type : 'DATE' (Service:
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID:
081a04e1-c4bb-4a4d-b4aa-84db6db14a64; Proxy: null))
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter$5.get(CatalogToHiveConverter.java:58)
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.getHiveException(CatalogToHiveConverter.java:97)
at
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.wrapInHiveException(CatalogToHiveConverter.java:88)
at
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getCatalogPartitions(GlueMetastoreClientDelegate.java:838)
at
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getPartitions(GlueMetastoreClientDelegate.java:823)
at
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.listPartitionsByFilter(AWSCatalogMetastoreClient.java:1171)
at
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:1161)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:950)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:348)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:247)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:285)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:239)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:328)
at
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:946)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$getPartitionsByFilter$1(PoolingHiveClient.scala:474)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:149)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.getPartitionsByFilter(PoolingHiveClient.scala:473)
at
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1620)
at
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:155)
... 87 more
{code}
> Spark AWS Glue date partition push down broken
> ----------------------------------------------
>
> Key: SPARK-43357
> URL: https://issues.apache.org/jira/browse/SPARK-43357
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2,
> 3.3.1, 3.2.3, 3.2.4, 3.3.2
> Reporter: Stijn De Haes
> Assignee: Stijn De Haes
> Priority: Major
> Fix For: 3.5.0
>
>
> When using the following project:
> [https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore]
> To have glue supported as as a hive metastore for spark there is an issue
> when reading a date-partitioned data set. Writing is fine.
> You get the following error:
> {quote}org.apache.hadoop.hive.metastore.api.InvalidObjectException:
> Unsupported expression '2023 - 05 - 03' (Service: AWSGlue; Status Code: 400;
> Error Code: InvalidInputException; Request ID:
> beed68c6-b228-442e-8783-52c25b9d2243; Proxy: null)
> {quote}
>
> A fix for this is making sure the date passed to glue is quoted
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]