nsivabalan commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r869734606


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -210,7 +210,7 @@ protected Map<HoodieLogBlock.HeaderMetadataType, String> 
generateHeader(String c
     header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, 
metaClient.getActiveTimeline().lastInstant().get().getTimestamp());
     header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
     header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
-        
String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_PREVIOUS_BLOCK.ordinal()));
+        
String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));

Review Comment:
   is this backwards compatible? for instance, can this read a log block which 
has ROLLBACK_PREVIOUS_BLOCK when written to disk ? 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> 
keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new 
Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, 
enableRecordLookups, keyField, internalSchema);
 
+      /**
+       * Traversal of log blocks from log files can be done in two directions.
+       * 1. Forward traversal
+       * 2. Reverse traversal
+       * For example:   BaseFile, LogFile1(LogBlock11,LogBlock12,LogBlock13), 
LofFile2(LogBlock21,LogBlock22,LogBlock23)
+       *    Forward traversal look like,
+       *        LogBlock11, LogBlock12, LogBlock13, LogBlock21, LogBlock22, 
LogBlock23
+       *    If we are considering reverse traversal including log blocks,
+       *        LogBlock23, LogBlock22, LogBlock21, LogBlock13, LogBlock12, 
LogBlock11
+       * Here, reverse traversal also traverses blocks in reverse order of 
creation.
+       *
+       * 1. Forward traversal
+       *  Forward traversal is easy to do in single writer mode. Where the 
rollback block is right after the effected data blocks.
+       *  With multiwriter mode the blocks can be out of sync. An example 
scenario.
+       *  B1, B2, B3, B4, R1(B3), B5
+       *  In this case, rollback block R1 is invalidating the B3 which is not 
the previous block.
+       *  This becomes more complicated if we have compacted blocks, which are 
data blocks created using log compaction.
+       *  TODO: Include support for log compacted blocks. 
https://issues.apache.org/jira/browse/HUDI-3580
+       *
+       *  To solve this do traversal twice.

Review Comment:
   I assume we will employ two traversals only when in need. i.e. when minor 
compactions are enabled. If not, can we avoid it and fallback to old behavior ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to