yihua commented on a change in pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#discussion_r785684621
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java
##########
@@ -160,4 +160,33 @@ public static void closeQuietly(Closeable closeable) {
LOG.warn("IOException during close", e);
}
}
+
+ public static void createFileInPath(FileSystem fileSystem,
org.apache.hadoop.fs.Path fullPath, Option<byte[]> content) {
+ try {
+ // If the path does not exist, create it first
+ if (!fileSystem.exists(fullPath)) {
+ if (fileSystem.createNewFile(fullPath)) {
+ LOG.info("Created a new file in meta path: " + fullPath);
+ } else {
+ throw new HoodieIOException("Failed to create file " + fullPath);
+ }
+ }
+
+ if (content.isPresent()) {
+ FSDataOutputStream fsout = fileSystem.create(fullPath, true);
+ fsout.write(content.get());
+ fsout.close();
+ }
+ } catch (IOException e) {
+ throw new HoodieIOException("Failed to create file " + fullPath, e);
+ }
+ }
+
+ public static Option<byte[]> readDataFromPath(FileSystem fileSystem,
org.apache.hadoop.fs.Path detailPath) {
Review comment:
Similar here.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -82,6 +90,7 @@
private final int minInstantsToKeep;
private final HoodieTable<T, I, K, O> table;
private final HoodieTableMetaClient metaClient;
+ private final String mergeArchivePlanName = "mergeArchivePlan";
Review comment:
Should this be a `public static final` variable?
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
##########
@@ -248,11 +253,32 @@ private HoodieInstant readCommit(GenericRecord record,
boolean loadDetails) {
break;
}
}
+ } catch (Exception originalException) {
+ // merge small archive files may left uncompleted archive file which
will cause exception.
+ // need to ignore this kind of exception here.
+ try {
+ Path planPath = new Path(metaClient.getArchivePath(),
"mergeArchivePlan");
+ HoodieWrapperFileSystem fileSystem = metaClient.getFs();
+ if (fileSystem.exists(planPath)) {
+ HoodieMergeArchiveFilePlan plan =
TimelineMetadataUtils.deserializeAvroMetadata(FileIOUtils.readDataFromPath(fileSystem,
planPath).get(), HoodieMergeArchiveFilePlan.class);
+ String mergedArchiveFileName = plan.getMergedArchiveFileName();
+ if (!StringUtils.isNullOrEmpty(mergedArchiveFileName) &&
fs.getPath().getName().equalsIgnoreCase(mergedArchiveFileName)) {
+ LOG.warn("Catch exception because of reading uncompleted
merging archive file " + mergedArchiveFileName + ". Ignore it here.");
+ continue;
+ }
+ }
+ throw originalException;
+ } catch (Exception e) {
+ // If anything wrong during parsing merge archive plan, we need to
throw the original exception.
+ // For example corrupted archive file and corrupted plan are both
existed.
+ throw originalException;
+ }
Review comment:
I agree that we should have a feature flag to turn all new logic off and
skip the corrupted merged archive files when loading the archive timeline, in
case there is an incomplete archive merge operation and the feature is turned
off in the next run.
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
##########
@@ -248,11 +253,32 @@ private HoodieInstant readCommit(GenericRecord record,
boolean loadDetails) {
break;
}
}
+ } catch (Exception originalException) {
+ // merge small archive files may left uncompleted archive file which
will cause exception.
+ // need to ignore this kind of exception here.
+ try {
+ Path planPath = new Path(metaClient.getArchivePath(),
"mergeArchivePlan");
Review comment:
Reuse `HoodieTimelineArchiveLog::mergeArchivePlanName`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]