Shawyeok commented on PR #25042:
URL: https://github.com/apache/pulsar/pull/25042#issuecomment-4020463970

   @TakaHiR07 @mattisonchao 
   
   **This pr would reintroduce a silent, unrecoverable failure mode.** Here is 
why:
   
   The root cause of #25041 is that the update-partition-count operation is not 
atomic — it can fail mid-way with a persistent side effect. The key difference 
between the old and new ordering:
   
   **Old order (pre-3.0.14, what this PR reverts to):**
   1. `tryCreatePartitionsAsync(N)` — writes managed-ledger ZK nodes for new 
partitions
   2. `createSubscriptions(N)` — ⚠️ failure here leaves orphaned ZK nodes
   3. Update partition metadata
   
   When step 2 fails (e.g. due to bundle not yet assigned, causing 307 redirect 
exhaustion), ZK nodes for the new partitions already exist but the metadata 
still shows the old count. These orphaned nodes appear as non-partitioned 
topics, causing any retry to permanently fail with:
   > "Already have non partition topic … could cause conflict"
   
   **New order (3.0.14+, what this PR reverts):**
   1. `updatePartitionedTopicAsync(N)` — update metadata first
   2. `tryCreatePartitionsAsync(N)` — idempotent ZK writes
   3. `createSubscriptionAsync(...)` — `ConflictException` is tolerated
   
   With metadata updated first, any mid-operation failure leaves the system in 
a **self-consistent, retryable state** — the metadata already reflects N 
partitions, so no orphaned nodes exist outside the valid range.
   
   Although the operator can still retry if the endpoint returns an exception, 
the old ordering makes retries **permanently broken** rather than transiently 
failing. This was verified on 2.8.1 clusters in my company.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to