bvaradar commented on pull request #2899:
URL: https://github.com/apache/hudi/pull/2899#issuecomment-839881044


   > > @danny0405 : Agree with @vinothchandar in what this change would 
accomplish. If we have incremental syncing enabled and move timeline server 
hosting to Job Manager, we will get the full benefit of storage RPC call 
reductions here.
   > > With this change, the reusability is limited to tasks running within a 
Task Manager at the scope of a single Hudi commit. For the next Hudi commit, a 
full resync of file-system view will happen.
   > > Now for the incremental timeline sync, we definitely need to enable this 
but we need to see if this 
[issue](https://issues.apache.org/jira/browse/HUDI-1275) is still present in 
master using a long running job and get to the root cause.
   > 
   > I saw that the `timeline` in `RemoteHoodieTableFileSystemView` was 
initialized in its constructor, and never refresh again for its lifecycle, so 
if a got a long running timeline service there, the 
`RemoteHoodieTableFileSystemView` expects to be always behind right ? Different 
write task still got duplicate sync request even if some write task already 
triggers the sync.
   
   The scope is managed at the HoodieTable level. Every Hudi operation like 
commit, compact will create a new meta client and  new HoodieTable which would 
create a new FileSystemViewManager object -> which would create a 
RemoteHoodieTableFileSystemView lazily the first time called. This way, we have 
one single consistent view for one Hudi operation for clients.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to