alexeykudinkin commented on code in PR #6830:
URL: https://github.com/apache/hudi/pull/6830#discussion_r999867005
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -160,12 +163,17 @@ private HoodieLogBlock readBlock() throws IOException {
return createCorruptBlock(blockStartPos);
}
- // We may have had a crash which could have written this block partially
- // Skip blockSize in the stream and we should either find a sync marker
(start of the next
- // block) or EOF. If we did not find either of it, then this block is a
corrupted block.
- boolean isCorrupted = isBlockCorrupted(blockSize);
- if (isCorrupted) {
- return createCorruptBlock(blockStartPos);
+ // Block is possibly corrupted only when write is not transactional
+ // the corrupted check could take 100's msec for larger file sizes,
skipping the check for fs with transactional write
+ // see https://issues.apache.org/jira/browse/HUDI-2118
+ if (!StorageSchemes.isWriteTransactional(fs.getScheme())) {
Review Comment:
I'd suggest we fold this check into the `isBlockCorrupted` check itself:
- At the end of the day, this part of the same check
- We limit the scope of the change to just `isBlockCorrupted` method
##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -769,6 +772,36 @@ public void testAppendAndReadOnCorruptedLog() throws
IOException, URISyntaxExcep
reader.close();
}
+ @Test
+ public void testSkipCorruptedCheck() throws Exception {
+ // normal case: if the block is corrupted, we should be able to read back
a corrupted block
+ Reader reader1 = createCorruptedFile("test-fileid1");
+ HoodieLogBlock block = reader1.next();
+ assertEquals(HoodieLogBlockType.CORRUPT_BLOCK, block.getBlockType(), "The
read block should be a corrupt block");
+ reader1.close();
+
+ // mock case: we want to verify that isBlockCorrupted() check is skipped
for transactional write fs
+ // so here created a corrupted block for a transactional write fs(which
would never happen) to verify the skip logic
+ // a better way to verify it is to mock the private method
isBlockCorrupted(), but it is not available in current Mockito framework
+ // and the attempt to onboard powermock with private method mocking
capability has failed
Review Comment:
We can omit the part regarding powermock, and just elaborate that we're
doing this b/c we can't mock private method
##########
hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java:
##########
@@ -97,4 +104,12 @@ public static boolean isAppendSupported(String scheme) {
}
return Arrays.stream(StorageSchemes.values()).anyMatch(s ->
s.supportsAppend() && s.scheme.equals(scheme));
}
+
+ public static boolean isWriteTransactional(String scheme) {
+ if (!isSchemeSupported(scheme)) {
+ throw new IllegalArgumentException("Unsupported scheme :" + scheme);
+ }
+
+ return Arrays.stream(StorageSchemes.values()).anyMatch(s ->
s.isWriteTransactional() && s.scheme.equals(scheme));
Review Comment:
If we know that scheme is supported we can just do `Scheme.valueOf`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]