[
https://issues.apache.org/jira/browse/HUDI-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
steven zhang updated HUDI-1392:
-------------------------------
Description:
Reproduce the issue with below steps:
set hoodie.datasource.write.hive_style_partitioning->true
spark.read().format("org.apache.hudi").option("mergeSchema",
true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ? "/*"
: "/*")).createOrReplaceTempView(hudiTable);
spark.sql("select * from hudiTable where date>'20200807'").explain();
print PartitionFilters: []
the cause of this issue is org.apache.hudi.DefaultSource#createRelation is call
by dataSource.createRelation(sparkSession.sqlContext,
caseInsensitiveOptions)([https://github.com/apache/spark/blob/954cd9feaa1a3d4ad9a235811ae58e02a63e8386/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
L355)
the input optParams is CaseInsensitiveMap type. hudi attached additional
parameters such as
val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++
translateViewTypesToQueryTypes(optParams)
the parameters type has been converted Map not CaseInsensitiveMap
parquet datasource infer Partition info will fetch basePath value thought
parameters.get(BASE_PATH_PARAM) (
[https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
L196) then the get method will not call CaseInsensitiveMap#get. just call
Map#get("bathPath") and return None
so it will cause infer nothing partition info.
and i found spark 2.4.7 version above (
https://issues.apache.org/jira/browse/SPARK-32364 ) has use caseInsensitiveMap
to fetch basePath although the intention of it is not same as this hudi issue.
and the lower spark version also has this issue.
so we need using
val parameters = translateViewTypesToQueryTypes(optParams) ++
Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
> lose partition info when using spark parameter "basePath"
> ----------------------------------------------------------
>
> Key: HUDI-1392
> URL: https://issues.apache.org/jira/browse/HUDI-1392
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: steven zhang
> Priority: Major
>
> Reproduce the issue with below steps:
> set hoodie.datasource.write.hive_style_partitioning->true
> spark.read().format("org.apache.hudi").option("mergeSchema",
> true).option("basePath",tablePath).load(tablePath + (nonPartitionedTable ?
> "/*" : "/*")).createOrReplaceTempView(hudiTable);
> spark.sql("select * from hudiTable where date>'20200807'").explain();
> print PartitionFilters: []
> the cause of this issue is org.apache.hudi.DefaultSource#createRelation is
> call by dataSource.createRelation(sparkSession.sqlContext,
> caseInsensitiveOptions)([https://github.com/apache/spark/blob/954cd9feaa1a3d4ad9a235811ae58e02a63e8386/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala]
> L355)
> the input optParams is CaseInsensitiveMap type. hudi attached additional
> parameters such as
> val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++
> translateViewTypesToQueryTypes(optParams)
> the parameters type has been converted Map not CaseInsensitiveMap
> parquet datasource infer Partition info will fetch basePath value thought
> parameters.get(BASE_PATH_PARAM) (
> [https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala]
> L196) then the get method will not call CaseInsensitiveMap#get. just call
> Map#get("bathPath") and return None
> so it will cause infer nothing partition info.
>
> and i found spark 2.4.7 version above (
> https://issues.apache.org/jira/browse/SPARK-32364 ) has use
> caseInsensitiveMap to fetch basePath although the intention of it is not same
> as this hudi issue. and the lower spark version also has this issue.
> so we need using
> val parameters = translateViewTypesToQueryTypes(optParams) ++
> Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL)
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)