[GitHub] [hudi] vinothchandar commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

GitBox Tue, 27 Jul 2021 17:19:22 -0700


vinothchandar commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r677872797




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
##########
@@ -60,7 +60,7 @@ public static String 
getRecordKeyFromGenericRecord(GenericRecord genericRecord,
    * @return the partition path for the passed in generic record.
    */
   public static String getPartitionPathFromGenericRecord(GenericRecord 
genericRecord, Option<BaseKeyGenerator> keyGeneratorOpt) {
-    return keyGeneratorOpt.isPresent() ? 
keyGeneratorOpt.get().getRecordKey(genericRecord) : 
genericRecord.get(HoodieRecord.PARTITION_PATH_METADATA_FIELD).toString();

Review comment:
       oops

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -210,8 +211,11 @@ protected String getCommitActionType() {
               .build();
 
           
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, 
scanner, readerSchema,
-              table.getMetaClient().getTableConfig().getPayloadClass()));
+              table.getMetaClient().getTableConfig().getPayloadClass(),
+              table.getMetaClient().getTableConfig().populateMetaFields() ? 
Option.empty() : 
Option.of(table.getMetaClient().getTableConfig().getRecordKeyFieldProp()),
+              table.getMetaClient().getTableConfig().populateMetaFields() ? 
Option.empty() : 
Option.of(table.getMetaClient().getTableConfig().getPartitionFieldProp())));

Review comment:
       I feel if we had a `Option.ofBoolean()` where it returns a Option.empty 
or the value to further map, this will read much nicer. 
   
   
`Option.ofBoolean(tableConfig.populateMetaFields).map(tableConfig::getPartitionFieldProp)`
 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/SpillableMapUtils.java
##########
@@ -110,8 +110,15 @@ public static long generateChecksum(byte[] data) {
    * Utility method to convert bytes to HoodieRecord using schema and payload 
class.
    */
   public static <R> R convertToHoodieRecordPayload(GenericRecord rec, String 
payloadClazz) {
-    String recKey = rec.get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString();
-    String partitionPath = 
rec.get(HoodieRecord.PARTITION_PATH_METADATA_FIELD).toString();
+    return convertToHoodieRecordPayload(rec, payloadClazz, 
HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD);

Review comment:
       can't you pass these as default at a higher layer. So there will only be 
one method here, which takes 4 args. 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
         && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()))) {
       throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
     }
+
+    // meta fields can be disabled only with SimpleKeyGenerator
+    if (!getTableConfig().populateMetaFields()
+        && 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 HoodieTableConfig.DEFAULT_HOODIE_TABLE_KEY_GENERATOR_CLASS)

Review comment:
       just use `org.apache.hudi.keygen.SimpleKeyGenerator.class.getName` or 
something always as opposed to using strings to hold the class name. the 
advantage is that the IDE can find these usages easily

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##########
@@ -82,6 +82,13 @@
   public static final int HOODIE_PARTITION_PATH_COL_POS = 3;
   public static final String HOODIE_READ_COLUMNS_PROP = 
"hoodie.read.columns.set";
 
+  public static final String HOODIE_USE_META_FIELDS = "hoodie.use.meta.fields";

Review comment:
       but its a table level property and the query has the understandin that 
its about data is stored, not just about how its written. I would advise 
against having a different name for the same thing. it leads to more harm than 
good over the long run

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
##########
@@ -232,21 +253,30 @@ public void testSimpleInsertAndUpdateHFile() throws 
Exception {
     }
   }
 
-  @Test
-  public void testSimpleClusteringNoUpdates() throws Exception {
-    testClustering(false);
+  @ParameterizedTest
+  @MethodSource("populateMetaFieldsParams")
+  public void testSimpleClusteringNoUpdates(boolean populateMetaFields) throws 
Exception {
+    clean();
+    init(HoodieTableConfig.HOODIE_BASE_FILE_FORMAT_PROP.defaultValue(), 
populateMetaFields);
+    testClustering(false, populateMetaFields);
   }
 
-  @Test
-  public void testSimpleClusteringWithUpdates() throws Exception {
-    testClustering(true);
+  @ParameterizedTest
+  @MethodSource("populateMetaFieldsParams")
+  public void testSimpleClusteringWithUpdates(boolean populateMetaFields) 
throws Exception {
+    clean();
+    init(HoodieTableConfig.HOODIE_BASE_FILE_FORMAT_PROP.defaultValue(), 
populateMetaFields);
+    testClustering(true, populateMetaFields);
   }
 
-  private void testClustering(boolean doUpdates) throws Exception {
+  private void testClustering(boolean doUpdates, boolean populateMetaFields) 
throws Exception {
     // set low compaction small File Size to generate more file groups.
     HoodieClusteringConfig clusteringConfig = 
HoodieClusteringConfig.newBuilder().withClusteringMaxNumGroups(10)
         
.withClusteringTargetPartitions(0).withInlineClusteringNumCommits(1).build();
-    HoodieWriteConfig cfg = getConfigBuilder(true, 10L, 
clusteringConfig).build();
+    HoodieWriteConfig.Builder cfgBuilder = getConfigBuilder(true, 10L, 
clusteringConfig);
+    addAppropriatePropsForPopulateMetaFields(cfgBuilder, populateMetaFields);

Review comment:
       rename `addConfigsForPopulateMetaFields` 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##########
@@ -302,7 +310,12 @@ private void processDataBlock(HoodieDataBlock dataBlock) 
throws Exception {
   }
 
   protected HoodieRecord<?> createHoodieRecord(IndexedRecord rec) {
-    return SpillableMapUtils.convertToHoodieRecordPayload((GenericRecord) rec, 
this.payloadClassFQN);
+    if (!simpleRecordKeyFieldOpt.isPresent()) {

Review comment:
       please try to use Option.map.orElse. style of writing these. 

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -68,6 +68,7 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieTableConfig.class);
 
   public static final String HOODIE_PROPERTIES_FILE = "hoodie.properties";
+  public static final String DEFAULT_HOODIE_TABLE_KEY_GENERATOR_CLASS = 
"org.apache.hudi.keygen.SimpleKeyGenerator";

Review comment:
       call this the default is very confusing. 

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestUtils.java
##########
@@ -171,6 +171,33 @@ public static long countRecordsSince(JavaSparkContext jsc, 
String basePath, SQLC
     }
   }
 
+  /**
+   * Obtain all new data written into the Hoodie table since the given 
timestamp.
+   */
+  public static long countAllRecords(JavaSparkContext jsc, String basePath, 
SQLContext sqlContext,

Review comment:
       lets reuse some helper or place this in the right helper test utils 
class.

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##########
@@ -82,6 +82,13 @@
   public static final int HOODIE_PARTITION_PATH_COL_POS = 3;
   public static final String HOODIE_READ_COLUMNS_PROP = 
"hoodie.read.columns.set";
 
+  public static final String HOODIE_USE_META_FIELDS = "hoodie.use.meta.fields";
+  public static final String DEFAULT_HOODIE_USE_META_FIELDS = "true";
+  public static final String RECORD_KEY_FIELD = "hoodie.record.key.field";

Review comment:
       I think we should just reuse from `HoodieTableConfig` and not redefine 
all of these again

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -98,7 +99,7 @@ public 
SparkExecuteClusteringCommitActionExecutor(HoodieEngineContext context,
 
     JavaRDD<WriteStatus>[] writeStatuses = 
convertStreamToArray(writeStatusRDDStream);
     JavaRDD<WriteStatus> writeStatusRDD = engineContext.union(writeStatuses);
-    
+

Review comment:
       can we avoid these whitespace changes?

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##########
@@ -52,29 +55,37 @@
   private final Map<String, HoodieRecord<? extends HoodieRecordPayload>> 
deltaRecordMap;
 
   private final Set<String> deltaRecordKeys;
+  private int recordKeyIndex = 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS;
   private Iterator<String> deltaItr;
 
   public RealtimeCompactedRecordReader(RealtimeSplit split, JobConf job,
       RecordReader<NullWritable, ArrayWritable> realReader) throws IOException 
{
     super(split, job);
     this.parquetReader = realReader;
-    this.deltaRecordMap = getMergedLogRecordScanner().getRecords();
+    this.deltaRecordMap = initAndGetMergedLogRecordScanner().getRecords();
     this.deltaRecordKeys = new HashSet<>(this.deltaRecordMap.keySet());
   }
 
   /**
    * Goes through the log files and populates a map with latest version of 
each key logged, since the base split was
    * written.
    */
-  private HoodieMergedLogRecordScanner getMergedLogRecordScanner() throws 
IOException {
+  private HoodieMergedLogRecordScanner initAndGetMergedLogRecordScanner() 
throws IOException {
     // NOTE: HoodieCompactedLogRecordScanner will not return records for an 
in-flight commit
     // but can return records for completed commits > the commit we are trying 
to read (if using
     // readCommit() API)
+    FileSystem fs = FSUtils.getFs(split.getPath().toString(), jobConf);
+    HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(fs.getConf()).setBasePath(split.getBasePath()).build();
+    Option<String> simpleRecordKeyFieldOpt = 
metaClient.getTableConfig().populateMetaFields() ? Option.empty() : 
Option.of(metaClient.getTableConfig().getRecordKeyFieldProp());

Review comment:
       same thing. this ternary operator use with an Option is kind of bugging 
me. makes lines longer than what they need to be.

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
##########
@@ -132,17 +140,27 @@ public void init(HoodieFileFormat baseFileFormat) throws 
IOException {
 
   @BeforeEach
   public void init() throws IOException {
-    init(HoodieTableConfig.HOODIE_BASE_FILE_FORMAT_PROP.defaultValue());
+    init(HoodieTableConfig.HOODIE_BASE_FILE_FORMAT_PROP.defaultValue(), true);
   }
 
   @AfterEach
   public void clean() throws IOException {
     cleanupResources();
   }
 
-  @Test
-  public void testSimpleInsertAndUpdate() throws Exception {
-    HoodieWriteConfig cfg = getConfig(true);
+  private static Stream<Arguments> populateMetaFieldsParams() {

Review comment:
       whats the increase in runtime to support this?  I wonder if we need to 
test every method with this combination.

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -206,16 +206,28 @@ private static Configuration 
addProjectionField(Configuration conf, String field
 
   public static void addRequiredProjectionFields(Configuration configuration) {
     // Need this to do merge records in HoodieRealtimeRecordReader
-    addProjectionField(configuration, HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
-    addProjectionField(configuration, HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
-    addProjectionField(configuration, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
+    if (configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS) == 
null || configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS)

Review comment:
       so why do we ned this if we are setting this already at the record 
reader level?

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##########
@@ -80,6 +85,27 @@ protected HoodieDefaultTimeline 
filterInstantsTimeline(HoodieDefaultTimeline tim
   }
 
   void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf 
jobConf) {
+
+    // set appropriate configs in jobConf to handle disabling of meta fields 
within canAddProjectionToJobConf and addRequiredProjectionFields
+    if (!realtimeSplit.getDeltaLogPaths().isEmpty()) {
+      // inject actual field info if meta fields are disabled
+      HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(jobConf).setBasePath(realtimeSplit.getBasePath()).build();
+      HoodieTableConfig tableConfig = metaClient.getTableConfig();
+      if (!tableConfig.populateMetaFields()) {
+        TableSchemaResolver tableSchemaResolver = new 
TableSchemaResolver(metaClient);

Review comment:
       same here. this is the problem of doing this at the record reader level, 
as opposed to `getSplits()`

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
         && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()))) {
       throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
     }
+
+    // meta fields can be disabled only with SimpleKeyGenerator
+    if (!getTableConfig().populateMetaFields()
+        && 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 HoodieTableConfig.DEFAULT_HOODIE_TABLE_KEY_GENERATOR_CLASS)
+        .equals(HoodieTableConfig.DEFAULT_HOODIE_TABLE_KEY_GENERATOR_CLASS)) {
+      throw new HoodieException("Only SimpleKeyGenerators are supported when 
meta fields are disabled. KeyGenerator used : "

Review comment:
       `SimpleKeyGenerators` this is not a class. so its confusing when we 
camel case it. Can we say `simple key generator` 

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -206,16 +206,28 @@ private static Configuration 
addProjectionField(Configuration conf, String field
 
   public static void addRequiredProjectionFields(Configuration configuration) {
     // Need this to do merge records in HoodieRealtimeRecordReader
-    addProjectionField(configuration, HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
-    addProjectionField(configuration, HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
-    addProjectionField(configuration, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
+    if (configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS) == 
null || configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS)
+        .equals(HoodieInputFormatUtils.DEFAULT_HOODIE_USE_META_FIELDS)) {
+      addProjectionField(configuration, 
HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
+      addProjectionField(configuration, 
HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
+      addProjectionField(configuration, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
+    } else {
+      addProjectionField(configuration, 
configuration.get(HoodieInputFormatUtils.RECORD_KEY_FIELD), 
Integer.parseInt(configuration.get(HoodieInputFormatUtils.RECORD_KEY_FIELD_INDEX)));
+      addProjectionField(configuration, 
configuration.get(HoodieInputFormatUtils.PARTITION_PATH_FIELD), 
Integer.parseInt(configuration.get(HoodieInputFormatUtils.PARTITION_PATH_FIELD_INDEX)));
+    }
   }
 
   public static boolean requiredProjectionFieldsExistInConf(Configuration 
configuration) {
     String readColNames = 
configuration.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "");
-    return readColNames.contains(HoodieRecord.RECORD_KEY_METADATA_FIELD)
-        && readColNames.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)
-        && readColNames.contains(HoodieRecord.PARTITION_PATH_METADATA_FIELD);
+    if (configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS) == 
null || configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS)
+        .equals(HoodieInputFormatUtils.DEFAULT_HOODIE_USE_META_FIELDS)) {
+      return readColNames.contains(HoodieRecord.RECORD_KEY_METADATA_FIELD)
+          && readColNames.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)
+          && readColNames.contains(HoodieRecord.PARTITION_PATH_METADATA_FIELD);
+    } else {
+      return readColNames.contains(HoodieInputFormatUtils.RECORD_KEY_FIELD)

Review comment:
       we don't validate the commit time fields?

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##########
@@ -52,29 +55,37 @@
   private final Map<String, HoodieRecord<? extends HoodieRecordPayload>> 
deltaRecordMap;
 
   private final Set<String> deltaRecordKeys;
+  private int recordKeyIndex = 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS;
   private Iterator<String> deltaItr;
 
   public RealtimeCompactedRecordReader(RealtimeSplit split, JobConf job,
       RecordReader<NullWritable, ArrayWritable> realReader) throws IOException 
{
     super(split, job);
     this.parquetReader = realReader;
-    this.deltaRecordMap = getMergedLogRecordScanner().getRecords();
+    this.deltaRecordMap = initAndGetMergedLogRecordScanner().getRecords();
     this.deltaRecordKeys = new HashSet<>(this.deltaRecordMap.keySet());
   }
 
   /**
    * Goes through the log files and populates a map with latest version of 
each key logged, since the base split was
    * written.
    */
-  private HoodieMergedLogRecordScanner getMergedLogRecordScanner() throws 
IOException {
+  private HoodieMergedLogRecordScanner initAndGetMergedLogRecordScanner() 
throws IOException {
     // NOTE: HoodieCompactedLogRecordScanner will not return records for an 
in-flight commit
     // but can return records for completed commits > the commit we are trying 
to read (if using
     // readCommit() API)
+    FileSystem fs = FSUtils.getFs(split.getPath().toString(), jobConf);
+    HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(fs.getConf()).setBasePath(split.getBasePath()).build();

Review comment:
       same. would be good to avoid these extra reads. these are incurred 
regardless of virtual keys or not, which is problematic.

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -206,16 +206,28 @@ private static Configuration 
addProjectionField(Configuration conf, String field
 
   public static void addRequiredProjectionFields(Configuration configuration) {
     // Need this to do merge records in HoodieRealtimeRecordReader
-    addProjectionField(configuration, HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
-    addProjectionField(configuration, HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
-    addProjectionField(configuration, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
+    if (configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS) == 
null || configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS)
+        .equals(HoodieInputFormatUtils.DEFAULT_HOODIE_USE_META_FIELDS)) {
+      addProjectionField(configuration, 
HoodieRecord.RECORD_KEY_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_RECORD_KEY_COL_POS);
+      addProjectionField(configuration, 
HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_COMMIT_TIME_COL_POS);
+      addProjectionField(configuration, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
HoodieInputFormatUtils.HOODIE_PARTITION_PATH_COL_POS);
+    } else {
+      addProjectionField(configuration, 
configuration.get(HoodieInputFormatUtils.RECORD_KEY_FIELD), 
Integer.parseInt(configuration.get(HoodieInputFormatUtils.RECORD_KEY_FIELD_INDEX)));
+      addProjectionField(configuration, 
configuration.get(HoodieInputFormatUtils.PARTITION_PATH_FIELD), 
Integer.parseInt(configuration.get(HoodieInputFormatUtils.PARTITION_PATH_FIELD_INDEX)));
+    }
   }
 
   public static boolean requiredProjectionFieldsExistInConf(Configuration 
configuration) {
     String readColNames = 
configuration.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "");
-    return readColNames.contains(HoodieRecord.RECORD_KEY_METADATA_FIELD)
-        && readColNames.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)
-        && readColNames.contains(HoodieRecord.PARTITION_PATH_METADATA_FIELD);
+    if (configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS) == 
null || configuration.get(HoodieInputFormatUtils.HOODIE_USE_META_FIELDS)

Review comment:
       is there a get(k, defaultValue) that we can use?

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java
##########
@@ -36,11 +38,14 @@
   private Iterator<HoodieRecord<? extends HoodieRecordPayload>> 
recordsIterator;
 
   public static <R extends IndexedRecord, T extends HoodieRecordPayload> 
HoodieFileSliceReader getFileSliceReader(
-      HoodieFileReader<R> baseFileReader, HoodieMergedLogRecordScanner 
scanner, Schema schema, String payloadClass) throws IOException {
+      HoodieFileReader<R> baseFileReader, HoodieMergedLogRecordScanner 
scanner, Schema schema, String payloadClass,
+      Option<String> simpleRecordKeyFieldOpt, Option<String> 
simplePartitionPathFieldOpt) throws IOException {

Review comment:
       if you can pick a key field name in the higher level i.e a custom field 
or `_hoodie_record_key`, then thats the best option. We don't even need the if 
in L48. 
   but if you cannot, then yeah may be its better add all three. 

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -210,8 +211,11 @@ protected String getCommitActionType() {
               .build();
 
           
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, 
scanner, readerSchema,
-              table.getMetaClient().getTableConfig().getPayloadClass()));
+              table.getMetaClient().getTableConfig().getPayloadClass(),

Review comment:
       fetch `table.getMetaClient().getTableConfig()` into a variable and then 
reuse? the code will reach much nicer IMHO

##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##########
@@ -80,6 +85,27 @@ protected HoodieDefaultTimeline 
filterInstantsTimeline(HoodieDefaultTimeline tim
   }
 
   void addProjectionToJobConf(final RealtimeSplit realtimeSplit, final JobConf 
jobConf) {
+
+    // set appropriate configs in jobConf to handle disabling of meta fields 
within canAddProjectionToJobConf and addRequiredProjectionFields
+    if (!realtimeSplit.getDeltaLogPaths().isEmpty()) {
+      // inject actual field info if meta fields are disabled
+      HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(jobConf).setBasePath(realtimeSplit.getBasePath()).build();

Review comment:
       this is an extra read of the `.hoodie`, from each record reader? This is 
a problem.  can we avoid this. 

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -532,8 +539,8 @@ object HoodieSparkSqlWriter {
                                              tableConfig: HoodieTableConfig,
                                              jsc: JavaSparkContext,
                                              tableInstantInfo: TableInstantInfo
-                                             ): (Boolean, 
common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {
-    if(writeResult.getWriteStatuses.rdd.filter(ws => ws.hasErrors).isEmpty()) {
+                                            ): (Boolean, 
common.util.Option[java.lang.String], common.util.Option[java.lang.String]) = {

Review comment:
       can you please fix IDE to just format the lines you touch?

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##########
@@ -80,6 +80,10 @@
   private final HoodieTableMetaClient hoodieTableMetaClient;
   // Merge strategy to use when combining records from log
   private final String payloadClassFQN;
+  // simple recordKey field
+  private Option<String> simpleRecordKeyFieldOpt = Option.empty();

Review comment:
       sg. I see that one member having the `Opt` suffix, the other does not. I 
prefer we keep it `simplePartitionPathField` and `simpleRecordKeyField`

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -68,6 +68,7 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieTableConfig.class);
 
   public static final String HOODIE_PROPERTIES_FILE = "hoodie.properties";
+  public static final String DEFAULT_HOODIE_TABLE_KEY_GENERATOR_CLASS = 
"org.apache.hudi.keygen.SimpleKeyGenerator";

Review comment:
       Can we get rid of this variable here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] vinothchandar commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

Reply via email to