cshannon commented on PR #3640: URL: https://github.com/apache/accumulo/pull/3640#issuecomment-1683838224
> > I was thinking it could be better to use the prev row column for each tablet instead since we track the previous row > > That probably would be cleaner. That is what happens when using Ample it buffers the entire row and uses that column to constuct the extent for the tablet. Ample can also optionally check the linking of tablets when reading them. > > > The manager is doing the metadata updates for the merge inside of TabletGroupWatcher and I am wondering is it possible for the table or tablets being modified to be split concurrently during the merge or compactions to happen? > > There should not be anything else writing to the tablets metadata because of the following combination of factors. > > 1. When a tablet is hosted, only the tablet server its assigned to will update its metadata > 2. When a tablet is not hosted or assigned, only the manager will update its metadata > 3. The merge operation unassigns the tablets before updating them, so tablet servers should not longer be doing any updates > 4. The merge operation gets a write lock on the table that prevent any other operations from running in the manager that would update the tablets. > > These are some basics of Accumulo's current concurrency model. Its very different in the elasticity branch. Thanks for explaining that concurrent model better, I assumed it was safe (otherwise merge wouldn't work at all) but was trying to figure out why because I kept thinking "what happens if a Tablet server mutates the metadata for this tablet while the Manager is running merge" and obviously the fact that the tablets are unassigned is the key part. I knew they were unassigned (I see it happening in the code and saw it when testing) but for some reason my brain wasn't connecting that part when thinking about race conditions and concurrency issues. Obviously if the tablet is not hosted and if only the Manager is doing things it is safe since there's a write lock and it's the only process doing things. For elasticity that will be interesting since we should be able to merge offline tablets that are on demand (currently you can't merge offline tables). And there will be multiple managers, etc so we will have to see how it works out I guess. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
