n3nash commented on a change in pull request #1320: [HUDI-571] Add min/max
headers on archived files
URL: https://github.com/apache/incubator-hudi/pull/1320#discussion_r378427516
##########
File path:
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
##########
@@ -268,6 +270,19 @@ public Path getArchiveFilePath() {
return archiveFilePath;
}
+ private void writeHeaderBlock(Schema wrapperSchema, List<HoodieInstant>
instants) throws Exception {
+ if (!instants.isEmpty()) {
+ Collections.sort(instants, HoodieInstant.COMPARATOR);
+ HoodieInstant minInstant = instants.get(0);
+ HoodieInstant maxInstant = instants.get(instants.size() - 1);
+ Map<HeaderMetadataType, String> metadataMap = Maps.newHashMap();
+ metadataMap.put(HeaderMetadataType.SCHEMA, wrapperSchema.toString());
+ metadataMap.put(HeaderMetadataType.MIN_INSTANT_TIME,
minInstant.getTimestamp());
+ metadataMap.put(HeaderMetadataType.MAX_INSTANT_TIME,
maxInstant.getTimestamp());
+ this.writer.appendBlock(new HoodieAvroDataBlock(Collections.emptyList(),
metadataMap));
+ }
+ }
+
private void writeToFile(Schema wrapperSchema, List<IndexedRecord> records)
throws Exception {
Review comment:
Move the writing of the header to this part, basically, augment the same
DataBlock that is has the archived records with the metadata information that
you want to push here, we already write the schema, just add more entries (like
above) to the headers here. Then you will be able to read each block and then
filter based on whether the block should be considered or not - this is more
generic than adding an extra empty log block to track min/max over the entire
file (which is hard since the file keeps growing anyways)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services