[
https://issues.apache.org/jira/browse/HUDI-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Li updated HUDI-1392:
--------------------------
Status: Open (was: New)
> lose partition info when using spark parameter "basePath"
> ----------------------------------------------------------
>
> Key: HUDI-1392
> URL: https://issues.apache.org/jira/browse/HUDI-1392
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Reporter: steven zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.6.1
>
>
> Reproduce the issue with below steps:
> set hoodie.datasource.write.hive_style_partitioning->true
> spark.read().format("org.apache.hudi").option("mergeSchema",
> true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ?
> "/*" : "/*")).createOrReplaceTempView(hudiTable);
> spark.sql("select * from hudiTable where date>'20200807'").explain();
> print PartitionFilters: []
> the cause of this issue is org.apache.hudi.DefaultSource#createRelation is
> call by dataSource.createRelation(sparkSession.sqlContext,
> caseInsensitiveOptions)([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
> L318)
> the input optParams is CaseInsensitiveMap type. hudi attached additional
> parameters such as
> val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++
> translateViewTypesToQueryTypes(optParams)
> the parameters type has been converted Map not CaseInsensitiveMap
> parquet datasource infer Partition info will fetch basePath value thought
> parameters.get(BASE_PATH_PARAM) (
> [https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
> L196) then the get method will not call CaseInsensitiveMap#get. just call
> Map#get("bathPath") and return None. so it will cause infer nothing partition
> info.
> and i found spark 2.4.7 version above (
> https://issues.apache.org/jira/browse/SPARK-32364 ) has use
> caseInsensitiveMap to fetch basePath although the intention of it is not same
> as this hudi issue. and the lower spark version also has this issue.
> so we need using
> val parameters = translateViewTypesToQueryTypes(optParams) ++
> Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
> for two reason: 1.lower spark version also has this issue 2. original type
> converted
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)