nsivabalan commented on code in PR #17850:
URL: https://github.com/apache/hudi/pull/17850#discussion_r2786549190
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -152,7 +148,7 @@ public class HoodieSyncConfig extends HoodieConfig {
})
.markAdvanced()
.withDocumentation("Class which implements PartitionValueExtractor to
extract the partition values, "
- + "default 'org.apache.hudi.hive.MultiPartKeysValueExtractor'.");
+ + "default is inferred based on partition configuration.");
Review Comment:
lets chat about how does the new config interplays w/ existing hive sync
config
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:
##########
@@ -323,6 +323,16 @@ object HoodieWriterUtils {
&& currentPartitionFields != tableConfigPartitionFields) {
diffConfigs.append(s"PartitionPath:\t$currentPartitionFields\t$tableConfigPartitionFields\n")
}
+
+ // Validate partition value extractor
+ val currentPartitionValueExtractor =
params.getOrElse(HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key(),
null)
Review Comment:
we should make a new writer property.
lets align before we go ahead w/ more changes
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -127,19 +125,17 @@ public class HoodieSyncConfig extends HoodieConfig {
partitionFieldsOpt = HoodieTableConfig.getPartitionFieldProp(cfg)
.or(() ->
Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
}
+
if (!partitionFieldsOpt.isPresent()) {
- return Option.empty();
+ return Option.of("org.apache.hudi.hive.NonPartitionedExtractor");
}
String partitionFields = partitionFieldsOpt.get();
if (StringUtils.nonEmpty(partitionFields)) {
int numOfPartFields = partitionFields.split(",").length;
if (numOfPartFields == 1) {
- if (cfg.contains(HIVE_STYLE_PARTITIONING_ENABLE)
- &&
cfg.getString(HIVE_STYLE_PARTITIONING_ENABLE).equals("true")) {
+ if
(cfg.contains(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key())
+ &&
cfg.getString(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key()).equals("true"))
{
return
Option.of("org.apache.hudi.hive.HiveStylePartitionValueExtractor");
- } else if (cfg.contains(SLASH_SEPARATED_DATE_PARTITIONING)
- &&
cfg.getString(SLASH_SEPARATED_DATE_PARTITIONING).equals("true")) {
Review Comment:
why changing these?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -335,6 +335,37 @@ public static final String getDefaultPayloadClassName() {
.sinceVersion("1.0.0")
.withDocumentation("Key Generator type to determine key generator
class");
+ public static final ConfigProperty<String> PARTITION_VALUE_EXTRACTOR_CLASS =
ConfigProperty
+ .key("hoodie.datasource.hive_sync.partition_extractor_class")
+ .defaultValue("org.apache.hudi.hive.MultiPartKeysValueExtractor")
+ .withInferFunction(cfg -> {
Review Comment:
we can't use a config key w/ "hive_sync" in the name.
We plan to use this for reading.
May be `hoodie.table.partition_value_extractor_class`.
but we might have to introduce new partition value extractor classes for the
read instead of using the same one we use for hive sync.
let me think about it more or chat w/ others to see how we can go about this
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java:
##########
@@ -127,19 +125,17 @@ public class HoodieSyncConfig extends HoodieConfig {
partitionFieldsOpt = HoodieTableConfig.getPartitionFieldProp(cfg)
.or(() ->
Option.ofNullable(cfg.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME)));
}
+
if (!partitionFieldsOpt.isPresent()) {
- return Option.empty();
+ return Option.of("org.apache.hudi.hive.NonPartitionedExtractor");
Review Comment:
can we move this into a separate PR?
looks like a bug fix right.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]