[jira] [Comment Edited] (SPARK-43357) Spark AWS Glue date partition push down broken

Nandini (Jira) Wed, 27 Dec 2023 10:32:05 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-43357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800846#comment-17800846
 ]


Nandini edited comment on SPARK-43357 at 12/27/23 6:31 PM:
-----------------------------------------------------------

Hello, 

After this fix, we are facing the issue during partition pruning

Repro - table_t1 is partitioned by created_date.

{code:java}
df = spark.sql("select session_id from table_t1 where created_date between 
'2023-12-01' and '2023-12-02'")
display(df)
{code}


{code:java}
Caused by: InvalidObjectException(message:Unknown type : 'DATE' (Service: 
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 
081a04e1-c4bb-4a4d-b4aa-84db6db14a64; Proxy: null))
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter$5.get(CatalogToHiveConverter.java:58)
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.getHiveException(CatalogToHiveConverter.java:97)
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.wrapInHiveException(CatalogToHiveConverter.java:88)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getCatalogPartitions(GlueMetastoreClientDelegate.java:838)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getPartitions(GlueMetastoreClientDelegate.java:823)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.listPartitionsByFilter(AWSCatalogMetastoreClient.java:1171)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2276)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:1161)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:950)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:348)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:247)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:285)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:239)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:328)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:946)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$getPartitionsByFilter$1(PoolingHiveClient.scala:474)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:149)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.getPartitionsByFilter(PoolingHiveClient.scala:473)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1620)
        at 
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:155)
        ... 87 more
{code}



was (Author: nandininelson):
Hi Team, 

After this fix, we are facing the issue during partition pruning

Repro - table_t1 is partitioned by created_date.

{code:java}
df = spark.sql("select session_id from table_t1 where created_date between 
'2023-12-01' and '2023-12-02'")
display(df)
{code}


{code:java}
Caused by: InvalidObjectException(message:Unknown type : 'DATE' (Service: 
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 
081a04e1-c4bb-4a4d-b4aa-84db6db14a64; Proxy: null))
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter$5.get(CatalogToHiveConverter.java:58)
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.getHiveException(CatalogToHiveConverter.java:97)
        at 
com.amazonaws.glue.catalog.converters.CatalogToHiveConverter.wrapInHiveException(CatalogToHiveConverter.java:88)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getCatalogPartitions(GlueMetastoreClientDelegate.java:838)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getPartitions(GlueMetastoreClientDelegate.java:823)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.listPartitionsByFilter(AWSCatalogMetastoreClient.java:1171)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2276)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:1161)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:950)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:348)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:247)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:285)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:239)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:328)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:946)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$getPartitionsByFilter$1(PoolingHiveClient.scala:474)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:149)
        at 
org.apache.spark.sql.hive.client.PoolingHiveClient.getPartitionsByFilter(PoolingHiveClient.scala:473)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1620)
        at 
com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:155)
        ... 87 more
{code}


> Spark AWS Glue date partition push down broken
> ----------------------------------------------
>
>                 Key: SPARK-43357
>                 URL: https://issues.apache.org/jira/browse/SPARK-43357
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2
>            Reporter: Stijn De Haes
>            Assignee: Stijn De Haes
>            Priority: Major
>             Fix For: 3.5.0
>
>
> When using the following project: 
> [https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore]
> To have glue supported as as a hive metastore for spark there is an issue 
> when reading a date-partitioned data set. Writing is fine.
> You get the following error: 
> {quote}org.apache.hadoop.hive.metastore.api.InvalidObjectException: 
> Unsupported expression '2023 - 05 - 03' (Service: AWSGlue; Status Code: 400; 
> Error Code: InvalidInputException; Request ID: 
> beed68c6-b228-442e-8783-52c25b9d2243; Proxy: null)
> {quote}
>  
> A fix for this is making sure the date passed to glue is quoted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-43357) Spark AWS Glue date partition push down broken

Reply via email to