ByteYue opened a new pull request, #17893:
URL: https://github.com/apache/doris/pull/17893
# Proposed changes
Issue Number: close #xxx
Recently we encountered one
## Problem summary
Recently, we found that in a high-concurrency import scenario, one version
was consistently missing across all three replicas of a certain tablet.
However, when we ran grep ${tablet_id} | grep publish | grep missed_version on
the backend, we could not find the corresponding logs. After checking the
transaction numbers of the missed_version-1 and missed_version+1, we finally
identified the transaction number of the missing version. We then used this
transaction ID to search for logs in the frontend, and found the following:

The same transaction timed out while attempting to acquire a write lock and
was aborted, but it was later successfully committed. However, the abort
transaction was also cleared on the backend by "clear transaction" rpc. As a
result, the publish task corresponding to this transaction can never succeed.
After reviewing the code related to transactions, it appears that there are
many places where access to the transactionState is not thread-safe.
Additionally, even the unprotectedCommitTransaction2PC method can successfully
commit a transaction without satisfying the required status limitations.
The transaction code does not fully consider duplicate concurrent RPCs, and
the current code is tightly coupled and difficult to modify. This pull request
can only attempt to handle the issue with unprotectedCommitTransaction2PC on a
case-by-case basis, while also adding a read lock to ensure thread-safe access
to transactionState (although this may not be sufficient).
Describe your changes.
## Checklist(Required)
* [ ] Does it affect the original behavior
* [ ] Has unit tests been added
* [ ] Has document been added or modified
* [ ] Does it need to update dependencies
* [ ] Is this PR support rollback (If NO, please explain WHY)
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]