[jira] [Updated] (HUDI-1392) lose partition info when using spark parameter "basePath"

Gary Li (Jira) Mon, 23 Nov 2020 18:43:08 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gary Li updated HUDI-1392:
--------------------------
    Status: Open  (was: New)

> lose partition info when using spark parameter "basePath" 
> ----------------------------------------------------------
>
>                 Key: HUDI-1392
>                 URL: https://issues.apache.org/jira/browse/HUDI-1392
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Spark Integration
>            Reporter: steven zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.1
>
>
> Reproduce the issue with below steps:
>         set hoodie.datasource.write.hive_style_partitioning->true
>         spark.read().format("org.apache.hudi").option("mergeSchema", 
> true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? 
> "/*" : "/*")).createOrReplaceTempView(hudiTable);
>         spark.sql("select * from hudiTable where date>'20200807'").explain();
>         print PartitionFilters: []
> the cause of this issue is org.apache.hudi.DefaultSource#createRelation is 
> call by dataSource.createRelation(sparkSession.sqlContext, 
> caseInsensitiveOptions)([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
>  L318)
> the input optParams is CaseInsensitiveMap type. hudi attached additional 
> parameters such as
> val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
> translateViewTypesToQueryTypes(optParams)
> the parameters  type has been converted Map not CaseInsensitiveMap
> parquet datasource infer Partition info will fetch basePath value thought 
> parameters.get(BASE_PATH_PARAM) (  
> [https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
>  L196) then the get method will not call CaseInsensitiveMap#get. just call 
> Map#get("bathPath") and return None. so it will cause infer nothing partition 
> info.
> and i found spark 2.4.7 version above ( 
> https://issues.apache.org/jira/browse/SPARK-32364 ) has use 
> caseInsensitiveMap to fetch basePath although the intention of it is not same 
> as this hudi issue. and the lower spark version also has this issue.
> so  we need using 
> val parameters = translateViewTypesToQueryTypes(optParams) ++ 
> Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
> for two reason: 1.lower spark version also has this issue  2. original type 
> converted
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1392) lose partition info when using spark parameter "basePath"

Reply via email to