sivabalan narayanan created HUDI-2860:
-----------------------------------------
Summary: Make timeline server work with multiple concurrent writers
Key: HUDI-2860
URL: https://issues.apache.org/jira/browse/HUDI-2860
Project: Apache Hudi
Issue Type: Improvement
Components: Writer Core
Reporter: sivabalan narayanan
Make timeline server work with multiple concurrent writers.
As of now, if an executor is lagging wrt timeline server (timeline server
refreshes its state for every call if timeline has moved), we throw an
exception and executor falls back to secondary which will list the file system.
We want to revisit this code and see how can we make timeline server work with
multi-writer scenario.
Few points to consider:
1. Executors should try to call getLatestBaseFilesOnOrBefore() instead of
getLatestBaseFiles(). Not calls has to be fixed. the ones doing conflict
resolutions, might have to get the latest snapshot always.
2. Fix async services to use separate write client in deltastreamer flow
3. Revist every call from executor and set "REFRESH" param on only when matters.
4. Sharing embedded timeline server.
5. Check for any holes. when C100 and C101 concurrently started and C101
finishes early, if C100 makes getLatestBaseFileOnOrBefore(), do we return base
files from C101?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)