bvaradar commented on pull request #2899: URL: https://github.com/apache/hudi/pull/2899#issuecomment-839881044
> > @danny0405 : Agree with @vinothchandar in what this change would accomplish. If we have incremental syncing enabled and move timeline server hosting to Job Manager, we will get the full benefit of storage RPC call reductions here. > > With this change, the reusability is limited to tasks running within a Task Manager at the scope of a single Hudi commit. For the next Hudi commit, a full resync of file-system view will happen. > > Now for the incremental timeline sync, we definitely need to enable this but we need to see if this [issue](https://issues.apache.org/jira/browse/HUDI-1275) is still present in master using a long running job and get to the root cause. > > I saw that the `timeline` in `RemoteHoodieTableFileSystemView` was initialized in its constructor, and never refresh again for its lifecycle, so if a got a long running timeline service there, the `RemoteHoodieTableFileSystemView` expects to be always behind right ? Different write task still got duplicate sync request even if some write task already triggers the sync. The scope is managed at the HoodieTable level. Every Hudi operation like commit, compact will create a new meta client and new HoodieTable which would create a new FileSystemViewManager object -> which would create a RemoteHoodieTableFileSystemView lazily the first time called. This way, we have one single consistent view for one Hudi operation for clients. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
