codope commented on code in PR #9581:
URL: https://github.com/apache/hudi/pull/9581#discussion_r1344501029
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDeleteBlock.java:
##########
@@ -65,17 +69,44 @@ public class HoodieDeleteBlock extends HoodieLogBlock {
private static final Lazy<HoodieDeleteRecord.Builder>
HOODIE_DELETE_RECORD_BUILDER_STUB =
Lazy.lazily(HoodieDeleteRecord::newBuilder);
+ private final boolean writeRecordPositions;
+ // Records to delete, sorted based on the record position if writing record
position to the log block header
private DeleteRecord[] recordsToDelete;
- public HoodieDeleteBlock(DeleteRecord[] recordsToDelete,
Map<HeaderMetadataType, String> header) {
- this(Option.empty(), null, false, Option.empty(), header, new HashMap<>());
- this.recordsToDelete = recordsToDelete;
+ public HoodieDeleteBlock(List<Pair<DeleteRecord, Long>> recordsToDelete,
+ boolean writeRecordPositions,
+ Map<HeaderMetadataType, String> header) {
+ this(Option.empty(), null, false, Option.empty(), header, new HashMap<>(),
writeRecordPositions);
+ if (writeRecordPositions) {
+ recordsToDelete.sort((o1, o2) -> {
+ long v1 = o1.getRight();
+ long v2 = o2.getRight();
+ return Long.compare(v1, v2);
+ });
+ if (recordsToDelete.get(0).getRight() > -1L) {
+ addRecordPositionsToHeader(
Review Comment:
record position can be invalid (-1) when:
1. current location is not known (new inserts going to log file).
2. when base format is hfile (record position not supported for hfile).
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -89,10 +90,11 @@ public class HoodieAppendHandle<T, I, K, O> extends
HoodieWriteHandle<T, I, K, O
private static final int NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE = 100;
protected final String fileId;
+ private final boolean writeRecordPositions;
Review Comment:
Renamed and moved to a level above in `HoodieWriteConfig`.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieDataBlock.java:
##########
@@ -70,14 +75,31 @@ public abstract class HoodieDataBlock extends
HoodieLogBlock {
* NOTE: This ctor is used on the write-path (ie when records ought to be
written into the log)
*/
public HoodieDataBlock(List<HoodieRecord> records,
+ boolean writeRecordPositions,
Map<HeaderMetadataType, String> header,
Map<HeaderMetadataType, String> footer,
String keyFieldName) {
super(header, footer, Option.empty(), Option.empty(), null, false);
+ if (writeRecordPositions) {
+ records.sort((o1, o2) -> {
Review Comment:
I've enabled writing record positions for some of the existing tests which
includes the following scenarios:
1. MOR table with compaction.
2. MOR table with clustering.
3. MOR table with clustering and no base file (before clustering).
4. COW/MOR table with different index types.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -173,18 +173,18 @@ public static <R> HoodieRecord<R>
tagRecord(HoodieRecord<R> record, HoodieRecord
*
* @param filePath - File to filter keys from
* @param candidateRecordKeys - Candidate keys to filter
- * @return List of candidate keys that are available in the file
+ * @return List of pairs of candidate keys and positions that are available
in the file
*/
- public static List<String> filterKeysFromFile(Path filePath, List<String>
candidateRecordKeys,
- Configuration configuration)
throws HoodieIndexException {
+ public static List<Pair<String, Long>> filterKeysFromFile(Path filePath,
List<String> candidateRecordKeys,
+ Configuration
configuration) throws HoodieIndexException {
ValidationUtils.checkArgument(FSUtils.isBaseFile(filePath));
- List<String> foundRecordKeys = new ArrayList<>();
+ List<Pair<String, Long>> foundRecordKeys = new ArrayList<>();
try (HoodieFileReader fileReader =
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
.getFileReader(configuration, filePath)) {
// Load all rowKeys from the file, to double-confirm
if (!candidateRecordKeys.isEmpty()) {
HoodieTimer timer = HoodieTimer.start();
- Set<String> fileRowKeys = fileReader.filterRowKeys(new
TreeSet<>(candidateRecordKeys));
+ Set<Pair<String, Long>> fileRowKeys = fileReader.filterRowKeys(new
TreeSet<>(candidateRecordKeys));
Review Comment:
Keys should be sorted for HFile reader. For others, it doesn't matter as
long as it is a set (for efficient contains check). Incorporated above
suggestion - here we just pass a set and init `SortedSet` in
`HoodieHFileAvroReader` only instead of building `TreeSet` for every other
format.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]