deardeng opened a new pull request, #63069:
URL: https://github.com/apache/doris/pull/63069

   master FE PublishVersionDaemon repeatedly throws NPE for a single "poisoned" 
transaction at tens of times per second, indefinitely:
   
     java.lang.NullPointerException: Cannot invoke
       "java.util.Map.containsKey(Object)" because the return value of
       "PublishVersionTask.getSuccTablets()" is null
       at DatabaseTransactionMgr.checkReplicaContinuousVersionSucc(...)
       at DatabaseTransactionMgr.finishCheckQuorumReplicas(...)
       at DatabaseTransactionMgr.finishTransaction(...)
       at GlobalTransactionMgr.finishTransaction(...)
       at PublishVersionDaemon.tryFinishTxnSync(...)
   
   The affected transaction never reaches VISIBLE; user-visible writes are 
stuck "committed but not queryable" until publish_version_timeout_second 
finally aborts them. Other concurrent transactions keep publishing normally; 
only the poisoned one loops on NPE.
   
   The check at DatabaseTransactionMgr.checkReplicaContinuousVersionSucc
   
     success = (task != null && task.isFinished()
                && task.getSuccTablets().containsKey(tabletId)) || (...);
   
   assumes the invariant "task.isFinished() == true => succTablets is 
populated". PublishVersionTask initialises succTablets to null in its 
constructor. One real and one latent code path can flip isFinished to true 
while leaving succTablets at that null:
   
   1. AgentTaskCleanupDaemon.removeInactiveBeAgentTasks (the production 
trigger): when a BE has been unreachable for MAX_FAILURE_TIMES (=3) heartbeat 
periods, every queued agent task on that BE -- including in-flight 
PublishVersionTasks -- is force-finished via setFinished(true) + 
setErrorCode(ABORTED). The daemon never calls setSuccTablets, so succTablets 
stays at the constructor's null. This path bypasses the BE -> MasterImpl 
finish-task RPC entirely, so nothing else ever fills the field.
   
   2. MasterImpl.finishPublishVersion (latent / defensive): structurally 
setSuccTablets(possibly-null) and setFinished(true) run before the status-code 
early return, so a null succTablets here would be reachable in principle. Under 
the current BE protocol it is not: be/src/agent/task_worker_pool.cpp 
unconditionally calls __set_succ_tablets(succ_tablets) before sending the 
finish RPC, even when publish itself failed and the map is empty. That makes 
request.isSetSuccTablets() always true and getSuccTablets() returns at worst an 
empty (non-null) map, so this path does not produce the NPE today. It is called 
out only as a latent risk that the call-site guard also covers should a future 
BE-side change ever stop setting that field. The production NPE was reproduced 
solely via path #1.
   
   Why the bad state is sticky after BE recovery: the same Java object is held 
by both AgentTaskQueue and TransactionState.publishVersionTasks (see 
PublishVersionDaemon.addPublishVersionTask). AgentTaskCleanupDaemon removes the 
AgentTaskQueue reference, but the
   TransactionState.publishVersionTasks reference is only cleared in 
TransactionState.pruneAfterVisible, which is invoked from PublishVersionDaemon 
only when the transaction reaches VISIBLE. NPE prevents VISIBLE -> reference is 
never pruned -> daemon polls the same poisoned object forever.
   
   A previous defensive change at this call site only added a "task != null" 
guard and did not cover succTablets == null, leaving the gap that the 
AgentTaskCleanupDaemon path later started to widen in production.
   
   Two complementary changes that together defend the invariant from both sides:
   
   1. PublishVersionTask: initialise succTablets to Maps.newHashMap() and 
coerce setSuccTablets(null) to an empty map. getSuccTablets() now never returns 
null. This replaces "no entry yet" with "empty map", which is semantically 
equivalent for every existing caller (containsKey, forEach, entrySet all behave 
the same on an empty map).
   
   2. DatabaseTransactionMgr.checkReplicaContinuousVersionSucc: explicit "succ 
!= null" guard at the call site. With change #1 this is a belt-and-braces check 
that also catches any future regression that re-introduces a null-writing path.
   
   Behavioural impact: when a PublishVersionTask is force-finished without a 
real BE response, the affected replica is now routed through the normal 
errorReplicaIds / tabletWriteFailedReplicas branches instead of killing the 
publish-thread worker with an NPE. The transaction either succeeds via 
remaining quorum on healthy BEs, or fails cleanly via the standard 
publish_version_timeout_second timeout.
   
   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to