nsivabalan commented on a change in pull request #3315: URL: https://github.com/apache/hudi/pull/3315#discussion_r675342299
########## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java ########## @@ -80,6 +80,10 @@ private final HoodieTableMetaClient hoodieTableMetaClient; // Merge strategy to use when combining records from log private final String payloadClassFQN; + // simple recordKey field + private Option<String> simpleRecordKeyFieldOpt = Option.empty(); Review comment: On the query side, I am naming all variables with the assumption that if populateMetaFields is set to false, then key gen will be simpleKeyGen as we would failed the operation during write itself. Where as on the write path, we will serialize all key gen properties. and hence recordKey property and partition path property is not prefixed with "simple". ########## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java ########## @@ -36,11 +38,14 @@ private Iterator<HoodieRecord<? extends HoodieRecordPayload>> recordsIterator; public static <R extends IndexedRecord, T extends HoodieRecordPayload> HoodieFileSliceReader getFileSliceReader( - HoodieFileReader<R> baseFileReader, HoodieMergedLogRecordScanner scanner, Schema schema, String payloadClass) throws IOException { + HoodieFileReader<R> baseFileReader, HoodieMergedLogRecordScanner scanner, Schema schema, String payloadClass, + Option<String> simpleRecordKeyFieldOpt, Option<String> simplePartitionPathFieldOpt) throws IOException { Review comment: Initially I had 3 args, populateMetaFields, simpleRecordKey and simplePartitionPath. Based on feedback for COW patch, have changed it this way. But just for this class, I feel we could go with 3 args and then in line 46 call either method depending on populateMetaFields. Mainly bcoz, caller of this method actually adds Option and sends to this method, but then in line 48, we again remove the Option and pass the argument. So, why not keep it simple. I am fine either ways, let me know your thoughts ########## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ########## @@ -128,8 +129,11 @@ object HoodieSparkSqlWriter { .setPayloadClassName(hoodieConfig.getString(PAYLOAD_CLASS_OPT_KEY)) .setPreCombineField(hoodieConfig.getStringOrDefault(PRECOMBINE_FIELD_OPT_KEY, null)) .setPartitionColumns(partitionColumns) - .setPopulateMetaFields(parameters.getOrElse(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()).toBoolean) + .setPopulateMetaFields(populateMetaFields) + .setRecordKeyFields(hoodieConfig.getString(RECORDKEY_FIELD_OPT_KEY)) + .setPartitionColumns(hoodieConfig.getString(PARTITIONPATH_FIELD_OPT_KEY)) Review comment: So, we are adding record keys, partition path and key gen class prop for all tables at spark datasource layer. ########## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ########## @@ -324,6 +324,13 @@ public void validateTableProperties(Properties properties, WriteOperationType op && Boolean.parseBoolean((String) properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()))) { throw new HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already disabled for the table. Can't be re-enabled back"); } + + // meta fields can be disabled only with SimpleKeyGenerator + if (!getTableConfig().populateMetaFields() + && !properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key()).equals("org.apache.hudi.keygen.SimpleKeyGenerator")) { Review comment: SimpleKeyGenerator class is not reachable from here and so have hard coded. Let me know if there is some other way to not hard code the class -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org