vbabenkoru opened a new pull request, #112:
URL: https://github.com/apache/flink-connector-pulsar/pull/112

   <!--
   *Thank you for contributing to Apache Flink Pulsar Connectors - we are happy 
that you want to help us improve our Flink connectors. To help the community 
review your contribution in the best possible way, please go through the 
checklist below, which will get the contribution into a shape in which it can 
be best reviewed.*
   
   ## Contribution Checklist
   
   - The name of the pull request should correspond to a [JIRA 
issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are 
made for typos in JavaDoc or documentation files, which need no JIRA issue.
   - Commits should be in the form of "[FLINK-XXXX][Component] Title of the 
pull request", where [FLINK-XXXX] should be replaced by the actual issue number.
       Generally, [Component] tag should indicate the part you modified. The 
options are: [Stream | Table | Test | E2E | Build].
       For example: "[FLINK-XXXX][Stream] XXXX" if you are working on the 
`DataStream` part of pulsar connector or "[FLINK-XXXX][Test] XXXX" if this pull 
request is only used for adding tests.
   - Each pull request should only have one JIRA issue.
   - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.
   -->
   
   ## Purpose of the change
   
   Fix the race condition issue that occasionally happens (about 1-2% 
probability per partition) because the connector is creating a dummy consumer 
to seek to the right cursor position, closes it and immediately after that 
creates the real consumer. It leads to a race condition where the previous 
consumer is not fully released on the broker side, and the broker responds with 
`Exclusive consumer is already connected `, which leads to the job being 
restarted. In our case we were subscribing to thousands of topics, so the job 
would continuously restart for hours until it reaches an attempt where none of 
the topics hit this race condition.
   
   ## Brief change log
   
   - Add a retry with a 2-second delay when creating a consumer for a split, to 
handle a race condition from creating two consumers in quick succession.
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality
   guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   - *Manually verified by running the Pulsar connector on a production-scale 
Flink cluster with thousands of topics.*
   
   ## Significant changes
   
   *(Please check any boxes [x] if the answer is "yes". You can first publish 
the PR and check them afterwards, for
   convenience.)*
   
   - [ ] Dependencies have been added or upgraded
   - [ ] Public API has been changed (Public API is any class annotated with 
`@Public(Evolving)`)
   - [ ] Serializers have been changed
   - [ ] New feature has been introduced
       - If yes, how is this documented? (not applicable / docs / JavaDocs / 
not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to