yui2010 commented on a change in pull request #2427: URL: https://github.com/apache/hudi/pull/2427#discussion_r555027450
########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileWriter.java ########## @@ -121,17 +121,10 @@ public void writeAvro(String recordKey, IndexedRecord object) throws IOException if (hfileConfig.useBloomFilter()) { hfileConfig.getBloomFilter().add(recordKey); - if (minRecordKey != null) { - minRecordKey = minRecordKey.compareTo(recordKey) <= 0 ? minRecordKey : recordKey; - } else { + if (minRecordKey == null) { Review comment: Hi @vinothchandar, Thanks for reviewing. it not computing the min/max. it only use the first recordKey and the last recordKey as min/max(HoodieSortedMergeHandle/BaseSparkCommitActionExecutor already compare the input records in order by recordKey) . It's like hbase store keyRange(firstKey/lastKey) as [https://github.com/apache/hbase/blob/rel/1.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L292](https://github.com/apache/hbase/blob/rel/1.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L292) and actually we also can get min/max RecordKey from HFile native throught `HFileReaderV3#getFirstKey()` and `HFileReaderV3#getLastRowKey()` from load-on-open section and i think we can use current implement(put min/max in FileInfo map). maybe we will add more properties. for example: add recordCount so we can choose seekto or loadall in `HoodieHFileReader#filterRowKeys` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org