MabelYC opened a new pull request #1437:
URL: https://github.com/apache/samza/pull/1437


   Symptom: When deploying a job, users will see error messages from the kafka 
clients finalizer complaining about "kafka consumer/producer allocated and not 
closed"
   
   These error messages appear multiple times in users' jobs and don't actually 
affect the job's performance or correctness. This leads to users falsely 
believing that any job failures is because of this message.
   
   Cause:
   All Kafka clients have a finalize method in them, when GC determines that 
there's no more references to the clients, it will try to close the clients and 
find that the clients wasn't closed properly yet. And was caused by two 
different reasons:
   1. In ChangelogStreamManager.class and RegexTopicGenerator.class, Samza 
created admin Clients for every system, no matter we need them or not. But we 
only stop the admin clients that we used.
   2. In KafkaCheckpointManager.class and CoordinatorStreamStore.class, we want 
to keep the kafka clients open until the containers are shut down. But during 
runtime, the ERROR log would show when there's no more references to the 
clients.
   
   Changes: 
   1. In ChangelogStreamManager.class and RegexTopicGenerator.class,  we will 
only create adminClients that we needed instead of creating adminClients for 
all systems and then pick up the one we want.
   2. In KafkaCheckpointManager.class and CoordinatorStreamStore.class, we 
override a finalize method to close the clients if there's no more references 
to the clients.
   
   Tests: local deployed some test-jobs including samza and beam-samza jobs, 
the error logs don't show again.
   
   API Changes: N/A
   Upgrade Instructions: N/A
   Usage Instructions:N/A
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to