danny0405 commented on pull request #2899:
URL: https://github.com/apache/hudi/pull/2899#issuecomment-834955549


   > > each writer task takes a AbstractWriteClient, and there may be multiple 
write tasks in one process.
   > 
   > @danny0405 please help me understand this better. By one process, you mean 
one yarn flink job? or jvm process? In Spark, the model is the executors talk 
to the driver running the embedded timeline server in Spark. We don't run the 
timeline server in the executors. Could you explain the Flink model little more
   > 
   > Let me look at the code again to see if/how we avoid the side effects in 
the meantime.
   
   For one process, i mean the JVM process, e.g the `TaskManager` of Flink, in 
flink write pipeline, the source records was assigned with file group ids then 
hand over to the write tasks, each write task write out these records buffer 
(grouping by file group id) using a write client there. Because the tasks are 
long running, each task starts a embedded timeline server there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to