nodece commented on PR #1500:
URL: 
https://github.com/apache/pulsar-client-go/pull/1500#issuecomment-4504296859

   Great fix on the deadlock path from #1494 — the lock-free 
`partitionConsumers()` snapshot + delayed `startDispatcher()` is a solid 
direction for the original blocking issue.
   
   I think there is still a new concurrency window introduced by removing the 
old mutex guarding `consumers`.
   
   In the new code:
   
   - `internalTopicSubscribeToPartitions()` builds `newConsumers`, then 
publishes via `c.consumers.Store(...)`, then starts dispatchers for new 
partitions.
   - `closeWithCause()` closes only one snapshot (`consumers := 
c.partitionConsumers()`) and no longer serializes with partition expansion.
   
   So this interleaving seems possible:
   
   1. Goroutine A enters `internalTopicSubscribeToPartitions()` and starts 
creating new partition consumers.
   2. Goroutine B calls `closeWithCause()`, takes an old snapshot, closes those 
consumers, and continues close flow.
   3. Goroutine A continues and still executes `c.consumers.Store(...)` + 
`startDispatcher()` for newly created consumers.
   
   Result: newly added partition consumers may be published/started after close 
has already closed only the old snapshot.
   
   Before this PR, the parent lock serialized expansion and close-related 
operations over `c.consumers`; removing that lock fixes deadlock, but also 
removes that lifecycle serialization.
   
   Could we gate expansion with context cancellation or a closing flag (checked 
before create and before Store/startDispatcher), so in-flight expansion 
stops/cleans up once close starts?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to