yihua commented on code in PR #9517:
URL: https://github.com/apache/hudi/pull/9517#discussion_r1325244242
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##########
@@ -59,7 +60,18 @@ public interface HoodieTableMetadataWriter extends
Serializable, AutoCloseable {
* @param commitMetadata commit metadata of the operation of interest.
* @param instantTime instant time of the commit.
*/
- void update(HoodieCommitMetadata commitMetadata, HoodieData<WriteStatus>
writeStatuses, String instantTime);
+ void updateFromWriteStatuses(HoodieCommitMetadata commitMetadata,
HoodieData<WriteStatus> writeStatuses, String instantTime);
+
+ /**
+ * Update the metadata table due to a COMMIT or REPLACECOMMIT operation.
+ * As compared to {@link #updateFromWriteStatuses(HoodieCommitMetadata,
HoodieData, String)}, this method
+ * directly updates metadata with the given records, instead of first
converting {@link WriteStatus} to {@link HoodieRecord}.
+ *
+ * @param commitMetadata commit metadata of the operation of interest.
+ * @param records records to update metadata with.
+ * @param instantTime instant time of the commit.
+ */
+ void update(HoodieCommitMetadata commitMetadata, HoodieData<HoodieRecord>
records, String instantTime);
Review Comment:
Based on the new logic, the `records` here are inserts/updates/deletes for
the data table. Aside from async indexer, can this be used to derive bloom
filters directly, i.e., pumping the
`RDD<HoodieRecord>`/`HoodieData<HoodieRecord>` directly from the write DAG and
generating bloom filters in memory instead of reading the parquet footers again
when updating MDT to reduce FS IO? Besides bloom filters, generally, this may
also be very useful for optimizations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]