nsivabalan commented on code in PR #9581:
URL: https://github.com/apache/hudi/pull/9581#discussion_r1313612529
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDeleteBlock.java:
##########
@@ -65,17 +69,44 @@ public class HoodieDeleteBlock extends HoodieLogBlock {
private static final Lazy<HoodieDeleteRecord.Builder>
HOODIE_DELETE_RECORD_BUILDER_STUB =
Lazy.lazily(HoodieDeleteRecord::newBuilder);
+ private final boolean writeRecordPositions;
+ // Records to delete, sorted based on the record position if writing record
position to the log block header
private DeleteRecord[] recordsToDelete;
- public HoodieDeleteBlock(DeleteRecord[] recordsToDelete,
Map<HeaderMetadataType, String> header) {
- this(Option.empty(), null, false, Option.empty(), header, new HashMap<>());
- this.recordsToDelete = recordsToDelete;
+ public HoodieDeleteBlock(List<Pair<DeleteRecord, Long>> recordsToDelete,
+ boolean writeRecordPositions,
+ Map<HeaderMetadataType, String> header) {
+ this(Option.empty(), null, false, Option.empty(), header, new HashMap<>(),
writeRecordPositions);
+ if (writeRecordPositions) {
+ recordsToDelete.sort((o1, o2) -> {
+ long v1 = o1.getRight();
+ long v2 = o2.getRight();
+ return Long.compare(v1, v2);
+ });
+ if (recordsToDelete.get(0).getRight() > -1L) {
+ addRecordPositionsToHeader(
Review Comment:
may I know under what circumstances, we will get -1 values here?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -70,14 +75,31 @@ public abstract class HoodieDataBlock extends
HoodieLogBlock {
* NOTE: This ctor is used on the write-path (ie when records ought to be
written into the log)
*/
public HoodieDataBlock(List<HoodieRecord> records,
+ boolean writeRecordPositions,
Map<HeaderMetadataType, String> header,
Map<HeaderMetadataType, String> footer,
String keyFieldName) {
super(header, footer, Option.empty(), Option.empty(), null, false);
+ if (writeRecordPositions) {
+ records.sort((o1, o2) -> {
Review Comment:
where do we guard the below condition.
in case of async compaction, latest parquet file may not exists only.
So, a new log file being added to latest file slice should not encode the
write positions.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java:
##########
@@ -137,6 +140,20 @@ public Roaring64NavigableMap getRecordPositions() throws
IOException {
return
LogReaderUtils.decodeRecordPositionsHeader(logBlockHeader.get(HeaderMetadataType.RECORD_POSITIONS));
}
+ protected void addRecordPositionsToHeader(Set<Long> positionSet, int
numRecords) {
+ if (positionSet.size() == numRecords) {
Review Comment:
I see. its here.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -70,14 +75,31 @@ public abstract class HoodieDataBlock extends
HoodieLogBlock {
* NOTE: This ctor is used on the write-path (ie when records ought to be
written into the log)
*/
public HoodieDataBlock(List<HoodieRecord> records,
+ boolean writeRecordPositions,
Map<HeaderMetadataType, String> header,
Map<HeaderMetadataType, String> footer,
String keyFieldName) {
super(header, footer, Option.empty(), Option.empty(), null, false);
+ if (writeRecordPositions) {
+ records.sort((o1, o2) -> {
Review Comment:
where is the logic where we avoid writing record positions if there are
duplicate records.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]