[ 
https://issues.apache.org/jira/browse/HUDI-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4932:
----------------------------
    Description: 
Currently, we assume that the partition column is always in String type during 
bootstrap operation.  
TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails 
for date partition column if the type inference of partition column is turned 
on.

 

We need to add a config to allow partition column inference in bootstrap so 
that other types of partition columns are supported.

 

HoodieSparkBootstrapSchemaProvider
{code:java}
private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig 
writeConfig, HoodieEngineContext context, Path filePath) {
  // NOTE: The type inference of partition column in the parquet table is 
turned off explicitly,
  // to be consistent with the existing bootstrap behavior, where the partition 
column is String
  // typed in Hudi table.
  ((HoodieSparkEngineContext) context).getSqlContext()
      .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
  StructType parquetSchema = ((HoodieSparkEngineContext) 
context).getSqlContext().read()
      .option("basePath", writeConfig.getBootstrapSourceBasePath())
      .parquet(filePath.toString())
      .schema(); {code}

  was:
Currently, we assume that the partition column is always in String type during 
bootstrap operation.  
TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails 
for date partition column if the type inference of partition column is turned 
on.

 

We need to add a config 

 

HoodieSparkBootstrapSchemaProvider
{code:java}
private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig 
writeConfig, HoodieEngineContext context, Path filePath) {
  // NOTE: The type inference of partition column in the parquet table is 
turned off explicitly,
  // to be consistent with the existing bootstrap behavior, where the partition 
column is String
  // typed in Hudi table.
  ((HoodieSparkEngineContext) context).getSqlContext()
      .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
  StructType parquetSchema = ((HoodieSparkEngineContext) 
context).getSqlContext().read()
      .option("basePath", writeConfig.getBootstrapSourceBasePath())
      .parquet(filePath.toString())
      .schema(); {code}


> Add a config to allow partition column inference in bootstrap
> -------------------------------------------------------------
>
>                 Key: HUDI-4932
>                 URL: https://issues.apache.org/jira/browse/HUDI-4932
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: bootstrap
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>             Fix For: 0.13.0
>
>
> Currently, we assume that the partition column is always in String type 
> during bootstrap operation.  
> TestDataSourceForBootstrap.testMetadataBootstrapCOWHiveStylePartitioned fails 
> for date partition column if the type inference of partition column is turned 
> on.
>  
> We need to add a config to allow partition column inference in bootstrap so 
> that other types of partition columns are supported.
>  
> HoodieSparkBootstrapSchemaProvider
> {code:java}
> private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig 
> writeConfig, HoodieEngineContext context, Path filePath) {
>   // NOTE: The type inference of partition column in the parquet table is 
> turned off explicitly,
>   // to be consistent with the existing bootstrap behavior, where the 
> partition column is String
>   // typed in Hudi table.
>   ((HoodieSparkEngineContext) context).getSqlContext()
>       .setConf(SQLConf.PARTITION_COLUMN_TYPE_INFERENCE(), false);
>   StructType parquetSchema = ((HoodieSparkEngineContext) 
> context).getSqlContext().read()
>       .option("basePath", writeConfig.getBootstrapSourceBasePath())
>       .parquet(filePath.toString())
>       .schema(); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to