yihua commented on code in PR #6673:
URL: https://github.com/apache/hudi/pull/6673#discussion_r972517320
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBootstrapRelation.scala:
##########
@@ -147,7 +146,7 @@ class HoodieBootstrapRelation(@transient val _sqlContext:
SQLContext,
if (fullSchema == null) {
logInfo("Inferring schema..")
val schemaResolver = new TableSchemaResolver(metaClient)
- val tableSchema = schemaResolver.getTableAvroSchemaWithoutMetadataFields
+ val tableSchema =
TableSchemaResolver.appendPartitionColumns(schemaResolver.getTableAvroSchemaWithoutMetadataFields,
metaClient.getTableConfig.getPartitionFields)
Review Comment:
We should also fix the table schema stored inside the commit metadata to
include the partition column with the correct inferred type, fixed in #6676.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieBootstrapConfig.java:
##########
@@ -50,9 +53,25 @@ public class HoodieBootstrapConfig extends HoodieConfig {
.sinceVersion("0.6.0")
.withDocumentation("Base path of the dataset that needs to be
bootstrapped as a Hudi table");
+ public static final ConfigProperty<String> PARTITION_SELECTOR_REGEX_MODE =
ConfigProperty
+ .key("hoodie.bootstrap.mode.selector.regex.mode")
+ .defaultValue(METADATA_ONLY.name())
+ .sinceVersion("0.6.0")
+ .withValidValues(METADATA_ONLY.name(), FULL_RECORD.name())
+ .withDocumentation("Bootstrap mode to apply for partition paths, that
match regex above. "
+ + "METADATA_ONLY will generate just skeleton base files with
keys/footers, avoiding full cost of rewriting the dataset. "
+ + "FULL_RECORD will perform a full copy/rewrite of the data as a
Hudi table.");
+
public static final ConfigProperty<String> MODE_SELECTOR_CLASS_NAME =
ConfigProperty
.key("hoodie.bootstrap.mode.selector")
.defaultValue(MetadataOnlyBootstrapModeSelector.class.getCanonicalName())
+ /*.withInferFunction(cfg -> {
Review Comment:
nit: remove unused code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]