keith-turner commented on issue #3845:
URL: https://github.com/apache/accumulo/issues/3845#issuecomment-1775661442

   @cshannon I have been looking at the 2.1 and 3.1 code.  As an overall goal 
for 2.1.x and 3.1, do the following goals seem correct?
   
    1. Only transition from WAITING_FOR_OFFLINE to MERGE if all tablets have no 
locations and no wals.
    2. Throw an exception if a location or wal is seen in the metadata table 
when in the MERGE state.
   
   For goal 1, the 2.1 code does seems mostly correct.  However when a tablet 
has wals and the merge state is WAITING_FOR_OFFLINE I am not sure 2.1 would 
host that tablet so it could be recovered.  Seems like [this 
code](https://github.com/apache/accumulo/blob/9dfa9e3b6489a8dc7e00d9377680bb91ef87b242/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L660)
 may keep the tablet unassigned when it has wals which would prevent recovery 
so maybe in 2.1 a merge could get stuck. For 3.1, it does seems that 
modifications are needed for goal 1 as you mentioned.
   
   For goal 1, thinking we could have a check for wals in 
`MergeStats.verifyMergeConsistency()` that returns false.  This could be done 
in 2.1 and 3.1.  It may be redundant in 2.1, but would not hurt.   If we have 
that, then when a tablet has wals and the merge state is WAITING_FOR_OFFLINE 
then the tablet goal state should be HOSTED so it can be recovered.
   
   For goal 2, if we are not doing that validation in the current code then I 
think it would be good to add it in 2.1 and 3.1
   
   I tried to think of ways to test this scenario in an IT and could not think 
of a good way to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to