[
https://issues.apache.org/jira/browse/HUDI-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-4932:
---------------------------------
Labels: pull-request-available (was: )
> Add a config to allow partition column type inference in bootstrap
> ------------------------------------------------------------------
>
> Key: HUDI-4932
> URL: https://issues.apache.org/jira/browse/HUDI-4932
> Project: Apache Hudi
> Issue Type: Improvement
> Components: bootstrap
> Reporter: Ethan Guo
> Assignee: Jonathan Vexler
> Priority: Major
> Labels: pull-request-available
>
> Currently, we assume that the partition column is always in String type
> during bootstrap operation.
> TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails
> for date partition column if the type inference of partition column is turned
> on.
>
> We need to add a config to allow partition column inference in bootstrap so
> that other types of partition columns are supported.
>
> HoodieSparkBootstrapSchemaProvider
> {code:java}
> private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig
> writeConfig, HoodieEngineContext context, Path filePath) {
> // NOTE: The type inference of partition column in the parquet table is
> turned off explicitly,
> // to be consistent with the existing bootstrap behavior, where the
> partition column is String
> // typed in Hudi table.
> ((HoodieSparkEngineContext) context).getSqlContext()
> .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
> StructType parquetSchema = ((HoodieSparkEngineContext)
> context).getSqlContext().read()
> .option("basePath", writeConfig.getBootstrapSourceBasePath())
> .parquet(filePath.toString())
> .schema(); {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)