ssdong edited a comment on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-811619487
@jsbali To give out extra insights and details, as @zherenyu831 has posted
in the beginning:
```
[20210323080718__replacecommit__COMPLETED]: size : 0
[20210323081449__replacecommit__COMPLETED]: size : 1
[20210323082046__replacecommit__COMPLETED]: size : 1
[20210323082758__replacecommit__COMPLETED]: size : 1
[20210323084004__replacecommit__COMPLETED]: size : 1
[20210323085044__replacecommit__COMPLETED]: size : 1
[20210323085823__replacecommit__COMPLETED]: size : 1
[20210323090550__replacecommit__COMPLETED]: size : 1
[20210323091700__replacecommit__COMPLETED]: size : 1
```
If we keep everything the same and let archive logic handling everything, it
would fail at 0 `partitionToReplaceFileIds` against
`20210323080718__replacecommit__COMPLETED`(the first item in the list above),
and this is a known issue.
To make the archive work, we tried to _manually_ delete the first _empty_
commit file, which is `20210323080718__replacecommit__COMPLETED`(the first item
in the list above). This has succeeded the archive, but instead, it has failed
upon `User class threw exception: org.apache.hudi.exception.HoodieIOException:
Could not read commit details from
s3://xxx/data/.hoodie/20210323081449.replacecommit`(the second item in the list
above)
Now to reason through the underlying mechanism of this error, given the
archive was successful, that means a few commit files have been placed within
the `.archive` folder, let's say
```
[20210323081449__replacecommit__COMPLETED]: size : 1
[20210323082046__replacecommit__COMPLETED]: size : 1
[20210323082758__replacecommit__COMPLETED]: size : 1
[20210323084004__replacecommit__COMPLETED]: size : 1
[20210323085044__replacecommit__COMPLETED]: size : 1
```
have been successfully moved and placed in `.archive`. At this moment, the
timeline has been updated and there are 3 remaining commit files which are:
```
[20210323085823__replacecommit__COMPLETED]: size : 1
[20210323090550__replacecommit__COMPLETED]: size : 1
[20210323091700__replacecommit__COMPLETED]: size : 1
```
Now, if you pay attention to the stack trace which caused `User class threw
exception: org.apache.hudi.exception.HoodieIOException: Could not read commit
details from s3://xxx/data/.hoodie/20210323081449.replacecommit`, and I am just
pasting them again:
```
User class threw exception: org.apache.hudi.exception.HoodieIOException:
Could not read commit details from
s3://xxx/data/.hoodie/20210323081449.replacecommit
at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:530)
at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:194)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
at
java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
at
org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
at
org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:179)
at
org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:112)
```
After a `close` action being triggered on `TimelineService`, which is
understandable, it propagates to `HoodieTableFileSystemView.close` and there is:
```
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
```
happening right after it. Now I am not exactly sure why we need an `init`
after `close` being called upon the `HoodieTableFileSystemView`.(Probably
someone with deep knowledge could answer it). If you look at the source code,
the `reset` and `init` are _initializing with new Hoodie timeline_. (
```
@Override
public final void reset() {
try {
writeLock.lock();
addedPartitions.clear();
resetViewState();
bootstrapIndex = null;
// Initialize with new Hoodie timeline.
init(metaClient, getTimeline());
} finally {
writeLock.unlock();
}
}
```
This above `getTimeline()` _didn't_ really fetch a _new_ timeline since
TimelineService has been closed, and obviously `public void sync()` isn't being
triggered, which resets the old timeline with the new ones. The Hudi table
view's in-memory timeline remains the very _old_ timeline, i.e. the one
_before_ doing the archive. If it tries to read those commits from in-memory
and perform corresponding actions, it will certainly fail without a doubt since
we have archived the commit files, and now they exist in the `.archive` folder.
It does sound like a paradox here, given that it throws exceptions _after_
we manually delete the commit file to try to save the archive logic. Shouldn't
it already exists from the beginning that even if we have a successful
archiving action, the in-memory timeline remains old based on the mechanisms we
do closing and initializing against a Hudi table view?
Thoughts? 😅
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]