keith-turner commented on issue #3845:
URL: https://github.com/apache/accumulo/issues/3845#issuecomment-1775661442
@cshannon I have been looking at the 2.1 and 3.1 code. As an overall goal
for 2.1.x and 3.1, do the following goals seem correct?
1. Only transition from WAITING_FOR_OFFLINE to MERGE if all tablets have no
locations and no wals.
2. Throw an exception if a location or wal is seen in the metadata table
when in the MERGE state.
For goal 1, the 2.1 code does seems mostly correct. However when a tablet
has wals and the merge state is WAITING_FOR_OFFLINE I am not sure 2.1 would
host that tablet so it could be recovered. Seems like [this
code](https://github.com/apache/accumulo/blob/9dfa9e3b6489a8dc7e00d9377680bb91ef87b242/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L660)
may keep the tablet unassigned when it has wals which would prevent recovery
so maybe in 2.1 a merge could get stuck. For 3.1, it does seems that
modifications are needed for goal 1 as you mentioned.
For goal 1, thinking we could have a check for wals in
`MergeStats.verifyMergeConsistency()` that returns false. This could be done
in 2.1 and 3.1. It may be redundant in 2.1, but would not hurt. If we have
that, then when a tablet has wals and the merge state is WAITING_FOR_OFFLINE
then the tablet goal state should be HOSTED so it can be recovered.
For goal 2, if we are not doing that validation in the current code then I
think it would be good to add it in 2.1 and 3.1
I tried to think of ways to test this scenario in an IT and could not think
of a good way to do it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]