yihua commented on a change in pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#discussion_r785709523
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
##########
@@ -248,11 +253,32 @@ private HoodieInstant readCommit(GenericRecord record,
boolean loadDetails) {
break;
}
}
+ } catch (Exception originalException) {
+ // merge small archive files may left uncompleted archive file which
will cause exception.
+ // need to ignore this kind of exception here.
+ try {
+ Path planPath = new Path(metaClient.getArchivePath(),
"mergeArchivePlan");
+ HoodieWrapperFileSystem fileSystem = metaClient.getFs();
+ if (fileSystem.exists(planPath)) {
+ HoodieMergeArchiveFilePlan plan =
TimelineMetadataUtils.deserializeAvroMetadata(FileIOUtils.readDataFromPath(fileSystem,
planPath).get(), HoodieMergeArchiveFilePlan.class);
Review comment:
The logic here looks okay to me. Could you add a few unit tests to
guard this logic and the failure recovery logic in the archival merging logic
as well, since the logic are critical?
I'm thinking about the following two cases:
(1) Construct a corrupted `mergeArchivePlan` file with random content so
that it cannot be deserialized.
(1.1) When archival merging is enabled, the plan should be deleted first.
(1.2) When archival merging is disabled, the archived timeline can still
be read successfully.
(1.3) If there are other corrupted archived files not from merging, the
loading of archived timeline should fail and original exception should be
thrown.
(2) Construct a working `mergeArchivePlan` file and a corrupted merged
archive file with random content so that it cannot be deserialized.
(2.1) When archival merging is enabled, the corrupted merged archive
file should be deleted first and proceed.
(2.2) When archival merging is disabled, the archived timeline can still
be read successfully and the corrupted archive file is skipped.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]