nsivabalan commented on code in PR #12310:
URL: https://github.com/apache/hudi/pull/12310#discussion_r1855701538


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1155,28 +1146,15 @@ public static HoodieData<HoodieRecord> 
convertMetadataToColumnStatsRecords(Hoodi
     }
 
     try {
-      Option<Schema> writerSchema =
-          
Option.ofNullable(commitMetadata.getMetadata(HoodieCommitMetadata.SCHEMA_KEY))
-              .flatMap(writerSchemaStr ->
-                  isNullOrEmpty(writerSchemaStr)
-                      ? Option.empty()
-                      : Option.of(new Schema.Parser().parse(writerSchemaStr)));
-
-      HoodieTableConfig tableConfig = dataMetaClient.getTableConfig();
-
-      // NOTE: Writer schema added to commit metadata will not contain Hudi's 
metadata fields
-      Option<Schema> tableSchema = writerSchema.map(schema ->
-          tableConfig.populateMetaFields() ? addMetadataFields(schema) : 
schema);
-
-      List<String> columnsToIndex = 
getColumnsToIndex(isColumnStatsIndexEnabled, targetColumnsForColumnStatsIndex,
-          Lazy.eagerly(tableSchema));
+      List<String> columnsToIndex = 
getColumnsToIndex(dataMetaClient.getTableConfig(), metadataConfig,

Review Comment:
   not all operations might contain schema in the commit metadata. 
   for eg, delete partition. can you fallback to table schema from table schema 
resolver if current commit metadata does not have any schema in the extra 
metadata. 



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1185,25 +1163,51 @@ public static HoodieData<HoodieRecord> 
convertMetadataToColumnStatsRecords(Hoodi
     }
   }
 
-  /**
-   * Get the list of columns for the table for column stats indexing
-   */
-  private static List<String> getColumnsToIndex(boolean 
isColumnStatsIndexEnabled,
-                                                List<String> 
targetColumnsForColumnStatsIndex,
-                                                Lazy<Option<Schema>> 
lazyWriterSchemaOpt) {
-    checkState(isColumnStatsIndexEnabled);
+  public static final String[] META_COLS_TO_ALWAYS_INDEX = 
{COMMIT_TIME_METADATA_FIELD, RECORD_KEY_METADATA_FIELD, 
PARTITION_PATH_METADATA_FIELD};
+  public static final Set<String> META_COL_SET_TO_INDEX = new 
HashSet<>(Arrays.asList(META_COLS_TO_ALWAYS_INDEX));
 
-    if (!targetColumnsForColumnStatsIndex.isEmpty()) {
-      return targetColumnsForColumnStatsIndex;
+  public static List<String> getColumnsToIndex(HoodieTableConfig tableConfig,

Review Comment:
   do we have UT for this method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to