[
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391265#comment-17391265
]
ASF GitHub Bot commented on HUDI-2177:
--------------------------------------
vinothchandar commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680614863
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -157,6 +157,11 @@
.withDocumentation("When enabled, populates all meta fields. When
disabled, no meta fields are populated "
+ "and incremental queries will not be functional. This is only
meant to be used for append only/immutable data for batch processing");
+ public static final ConfigProperty<String> HOODIE_TABLE_KEY_GENERATOR_CLASS
= ConfigProperty
+ .key("hoodie.datasource.write.keygenerator.class")
Review comment:
do we really write this as `hoodie.datasource` prefix? This is a table
level property, not just for the datasource write. lets fix this?
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/CompactionTestBase.java
##########
@@ -198,7 +198,7 @@ protected void executeCompaction(String
compactionInstantTime, SparkRDDWriteClie
assertEquals(latestCompactionCommitTime, compactionInstantTime,
"Expect compaction instant time to be the latest commit time");
assertEquals(expectedNumRecs,
- HoodieClientTestUtils.countRecordsSince(jsc, basePath, sqlContext,
timeline, "000"),
+ HoodieClientTestUtils.countRecordsWithOptionalSince(jsc, basePath,
sqlContext, timeline, Option.of("000")),
Review comment:
rename `countRecordOptionallySince`
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -36,16 +38,20 @@
private String basePath;
+ private Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt =
Option.empty();
+
public HoodieRealtimeFileSplit() {
super();
}
- public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath,
List<String> deltaLogPaths, String maxCommitTime)
+ public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath,
List<String> deltaLogPaths, String maxCommitTime,
+ Option<HoodieVirtualKeyInfo>
hoodieVirtualKeyInfoOpt)
Review comment:
I think its okay to drop the `Opt` everywhere. `virtualKeyInfo` is
concise and already conveys the meaning. `virtualKeyInfo.isPresent()` will
again convey what `Opt` would have conveyed.
Also lets drop `hoodie` prefix everywhere in the variables as well.
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##########
@@ -52,8 +53,15 @@
*/
String getBasePath();
+ /**
+ * Returns Virtual key info if meta fields are disabled.
+ * @return
+ */
+ Option<HoodieVirtualKeyInfo> getHoodieVirtualKeyInfoOpt();
Review comment:
fix name.
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -204,23 +226,34 @@ private static Configuration
addProjectionField(Configuration conf, String field
return conf;
}
- public static void addRequiredProjectionFields(Configuration configuration) {
+ public static void addRequiredProjectionFields(Configuration configuration,
Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt) {
Review comment:
namign
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -211,8 +213,11 @@ protected String getCommitActionType() {
.withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled())
.build();
+ HoodieTableConfig tableConfig =
table.getMetaClient().getTableConfig();
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader,
scanner, readerSchema,
- table.getMetaClient().getTableConfig().getPayloadClass()));
+ tableConfig.getPayloadClass(),
+ tableConfig.populateMetaFields() ? Option.empty() :
Option.of(tableConfig.getRecordKeyFieldProp()),
Review comment:
we can use `Option<Pair<String, String>>` anywhere both a record key and
partition path field needs to be passed.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##########
@@ -80,6 +82,10 @@
private final HoodieTableMetaClient hoodieTableMetaClient;
// Merge strategy to use when combining records from log
private final String payloadClassFQN;
+ // simple recordKey field
+ private Option<String> simpleRecordKeyField = Option.empty();
Review comment:
lets use a Pair
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -60,6 +66,16 @@ public String getBasePath() {
return basePath;
}
+ @Override
+ public void setHoodieVirtualKeyInfoOpt(Option<HoodieVirtualKeyInfo>
hoodieVirtualKeyInfoOpt) {
Review comment:
fix the getter and setter names
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -126,8 +131,10 @@ private void initIfNeeded() {
HoodieTimer readTimer = new HoodieTimer().startTimer();
Option<GenericRecord> baseRecord = baseFileReader.getRecordByKey(key);
if (baseRecord.isPresent()) {
- hoodieRecord =
SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
- metaClient.getTableConfig().getPayloadClass());
+ hoodieRecord = tableConfig.populateMetaFields() ?
SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
+ tableConfig.getPayloadClass()) :
SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
+ tableConfig.getPayloadClass(),
tableConfig.getRecordKeyFieldProp(),
Review comment:
lets use a Pair
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##########
@@ -70,13 +78,24 @@
*/
void setBasePath(String basePath);
+ void setHoodieVirtualKeyInfoOpt(Option<HoodieVirtualKeyInfo>
hoodieVirtualKeyInfoOpt);
+
default void writeToOutput(DataOutput out) throws IOException {
InputSplitUtils.writeString(getBasePath(), out);
InputSplitUtils.writeString(getMaxCommitTime(), out);
out.writeInt(getDeltaLogPaths().size());
for (String logFilePath : getDeltaLogPaths()) {
InputSplitUtils.writeString(logFilePath, out);
}
+ if (!getHoodieVirtualKeyInfoOpt().isPresent()) {
Review comment:
Option.ifPresent.orElse() ? lets make use of the Option APIs!
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -36,16 +38,20 @@
private String basePath;
+ private Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt =
Option.empty();
+
public HoodieRealtimeFileSplit() {
super();
}
- public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath,
List<String> deltaLogPaths, String maxCommitTime)
+ public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath,
List<String> deltaLogPaths, String maxCommitTime,
+ Option<HoodieVirtualKeyInfo>
hoodieVirtualKeyInfoOpt)
Review comment:
encoding data types in the name is no longer needed in the age of modern
IDEs. says. - Robert C Martin in Clean Code :)
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties,
WriteOperationType op
&& Boolean.parseBoolean((String)
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(),
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()))) {
throw new
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already
disabled for the table. Can't be re-enabled back");
}
+
+ // meta fields can be disabled only with SimpleKeyGenerator
+ if (!getTableConfig().populateMetaFields()
+ &&
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
"org.apache.hudi.keygen.SimpleKeyGenerator")
Review comment:
use the class name and do a `.class.getName()`. I think I had a similar
comment before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Virtual keys support for Compaction
> -----------------------------------
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: Writer Core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction
--
This message was sent by Atlassian Jira
(v8.3.4#803005)