cshannon commented on issue #3845: URL: https://github.com/apache/accumulo/issues/3845#issuecomment-1763501081
I looked into this some today and the merge code will first make sure that the tablets are hosted before beginning the steps to start the merge to try and prevent any wals from existing as the wals will be processed before the tablet is hosted. After the merge goes into the `started` state it verifies the total tablets equals hosted before going into the waiting for offline state. https://github.com/apache/accumulo/blob/e68fa4ac2243094c64451e06c554653706a37b64/server/manager/src/main/java/org/apache/accumulo/manager/state/MergeStats.java#L101-L110 After talking to @keith-turner , it was pointed out that there is a potential edge case where during the period of time that the manager is waiting for the tablets to be offline to merge (so after the hosted check) wals could exist that need recovery if a tserver dies. To handle the edge case the merge could check to make sure that all the unhosted tablets do not have any wals before beginning the metadata changes and if wals exist throw an exception to abort the merge. I can work on a fix for this which would need to also go to 2.1 and main and not just elasticity -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
