[jira] [Updated] (HUDI-1392) lose partition info when using spark parameter "basePath"

steven zhang (Jira) Tue, 10 Nov 2020 23:54:57 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


steven zhang updated HUDI-1392:
-------------------------------
    Description: 
Reproduce the issue with below steps:

        set hoodie.datasource.write.hive_style_partitioning->true

        spark.read().format("org.apache.hudi").option("mergeSchema", 
true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? "/*" 
: "/*")).createOrReplaceTempView(hudiTable);

        spark.sql("select * from hudiTable where date>'20200807'").explain();

        print PartitionFilters: []

the cause of this issue is org.apache.hudi.DefaultSource#createRelation is call 
by dataSource.createRelation(sparkSession.sqlContext, 
caseInsensitiveOptions)([https://github.com/apache/spark/blob/954cd9feaa1a3d4ad9a235811ae58e02a63e8386/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
  L355)

the input optParams is CaseInsensitiveMap type. hudi attached additional 
parameters such as

val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
translateViewTypesToQueryTypes(optParams)

the parameters  type has been converted Map not CaseInsensitiveMap

parquet datasource infer Partition info will fetch basePath value thought 
parameters.get(BASE_PATH_PARAM) (  
[https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
 L196) then the get method will not call CaseInsensitiveMap#get. just call 
Map#get("bathPath") and return None

so it will cause infer nothing partition info.

 

and i found spark 2.4.7 version above ( 
https://issues.apache.org/jira/browse/SPARK-32364 ) has use caseInsensitiveMap 
to fetch basePath although the intention of it is not same as this hudi issue. 
and the lower spark version also has this issue.

so we need using 

val parameters = translateViewTypesToQueryTypes(optParams) ++ 
Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)

 

 

 

 

 

 

 

 

> lose partition info when using spark parameter "basePath" 
> ----------------------------------------------------------
>
>                 Key: HUDI-1392
>                 URL: https://issues.apache.org/jira/browse/HUDI-1392
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: steven zhang
>            Priority: Major
>
> Reproduce the issue with below steps:
>         set hoodie.datasource.write.hive_style_partitioning->true
>         spark.read().format("org.apache.hudi").option("mergeSchema", 
> true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? 
> "/*" : "/*")).createOrReplaceTempView(hudiTable);
>         spark.sql("select * from hudiTable where date>'20200807'").explain();
>         print PartitionFilters: []
> the cause of this issue is org.apache.hudi.DefaultSource#createRelation is 
> call by dataSource.createRelation(sparkSession.sqlContext, 
> caseInsensitiveOptions)([https://github.com/apache/spark/blob/954cd9feaa1a3d4ad9a235811ae58e02a63e8386/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
>   L355)
> the input optParams is CaseInsensitiveMap type. hudi attached additional 
> parameters such as
> val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
> translateViewTypesToQueryTypes(optParams)
> the parameters  type has been converted Map not CaseInsensitiveMap
> parquet datasource infer Partition info will fetch basePath value thought 
> parameters.get(BASE_PATH_PARAM) (  
> [https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
>  L196) then the get method will not call CaseInsensitiveMap#get. just call 
> Map#get("bathPath") and return None
> so it will cause infer nothing partition info.
>  
> and i found spark 2.4.7 version above ( 
> https://issues.apache.org/jira/browse/SPARK-32364 ) has use 
> caseInsensitiveMap to fetch basePath although the intention of it is not same 
> as this hudi issue. and the lower spark version also has this issue.
> so we need using 
> val parameters = translateViewTypesToQueryTypes(optParams) ++ 
> Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1392) lose partition info when using spark parameter "basePath"

Reply via email to