rmahindra123 opened a new pull request #3656:
URL: https://github.com/apache/hudi/pull/3656


   ## Verify this pull request
   
   After stress testing with Kafka (MSK), confluent schema registry and scripts 
to generate the kafka records, this PR contains all fixes:
   
   1.When hudi writer fails, the participant sends an empty List, instead throw 
retryable exception
   2. In coordinator, shutdown scheduler to ensure the old coordinator is 
properly cleaned up during re-assignement
   3. Connect calls 2 APIs: put() and preCommit(). We run the complete state 
machine for both the APIs, that may reset the kafka offset if its not in sync 
with the coordinator. Resetting the offset when preCommit is called is causing 
issues in the following cases: Run the connect sink with data in the kafka and 
wait for a hudi commit. Then kill the worker and restart it. After a 
START_COMMIT, then start another worker and after sometime. We see the 
following problem: the tasks in the first worker are killed and all tasks are 
assigned to the second worker. The issue was that when preCommit is called, we 
reset the offset (by calling context.offset) and that causes the task to crash. 
=> The fix is to not process the state machine when preCommit is called. The 
connect platform calls PUT even when the kafka consumer is paused. Hence, we 
can only rely on PUT API to execute the state machine, and avoid running the 
state machine on preCommit calls, instead jut return the latest kafka offsets
  in preCommit API call.
   4. Fix logging (log4j was getting imported twice).
   5. Fix the script to generate the kafka records, reusing the docker demo 
json payloads.
   6. Fix the README accordingly.
   7. Fix the toString conversion of ControlEvent.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to