Carlos Mario created SPARK-30709:
------------------------------------

             Summary: Spark 2.3 to Spark 2.4 Upgrade. Problems reading HIVE 
partitioned tables.
                 Key: SPARK-30709
                 URL: https://issues.apache.org/jira/browse/SPARK-30709
             Project: Spark
          Issue Type: Question
          Components: SQL
    Affects Versions: 2.4.0
         Environment: PRE- Production
            Reporter: Carlos Mario


Hello

We recently updated our preproduction environment from Spark 2.3 to Spark 2.4.0

Along time we have created a big amount of tables in Hive Metastore, 
partitioned by 2 fields one of them String and the other one BigInt.

We were reading this tables with Spark 2.3 with no problem, but after upgrading 
to Spark 2.4 we get the following log every time we run our SW:

<log>

log_filterBIGINT.out:

 Caused by: MetaException(message:Filtering is supported only on partition keys 
of type string) Caused by: MetaException(message:Filtering is supported only on 
partition keys of type string) Caused by: MetaException(message:Filtering is 
supported only on partition keys of type string)

 

hadoop-cmf-hive-HIVEMETASTORE-isblcsmsttc0001.scisb.isban.corp.log.out.1:

 

2020-01-10 09:36:05,781 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-138]: 
MetaException(message:Filtering is supported only on partition keys of type 
string)

2020-01-10 11:19:19,208 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-187]: 
MetaException(message:Filtering is supported only on partition keys of type 
string)

2020-01-10 11:19:54,780 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-167]: 
MetaException(message:Filtering is supported only on partition keys of type 
string)

 </log>

 

We know the best practice from Spark point of view is to use 'STRING' type for 
partition columns, but we need to explore a solution we'll be able to deploy 
with ease, due to the big amount of tables created with a bigiint type column 
partition.

 

As a first solution we tried to set the  
spark.sql.hive.manageFilesourcePartitions parameter to false in the Spark 
Submmit, but after reruning the SW the error stood still.

 

Is there anyone in the community who experienced the same problem? What was the 
solution for it? 

 

Kind Regards and thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to