cshannon commented on PR #3640:
URL: https://github.com/apache/accumulo/pull/3640#issuecomment-1683838224

   > > I was thinking it could be better to use the prev row column for each 
tablet instead since we track the previous row
   > 
   > That probably would be cleaner. That is what happens when using Ample it 
buffers the entire row and uses that column to constuct the extent for the 
tablet. Ample can also optionally check the linking of tablets when reading 
them.
   > 
   > > The manager is doing the metadata updates for the merge inside of 
TabletGroupWatcher and I am wondering is it possible for the table or tablets 
being modified to be split concurrently during the merge or compactions to 
happen?
   > 
   > There should not be anything else writing to the tablets metadata because 
of the following combination of factors.
   > 
   > 1. When a tablet is hosted, only the tablet server its assigned to will 
update its metadata
   > 2. When a tablet is not hosted or assigned, only the manager will update 
its metadata
   > 3. The merge operation unassigns the tablets before updating them, so 
tablet servers should not longer be doing any updates
   > 4. The merge operation gets a write lock on the table that prevent any 
other operations from running in the manager that would update the tablets.
   > 
   > These are some basics of Accumulo's current concurrency model. Its very 
different in the elasticity branch.
   
   Thanks for explaining that concurrent model better, I assumed it was safe 
(otherwise merge wouldn't work at all) but was trying to figure out why because 
I kept thinking "what happens if a Tablet server mutates the metadata for this 
tablet while the Manager is running merge" and obviously the fact that the 
tablets are unassigned is the key part. I knew they were unassigned (I see it 
happening in the code and saw it when testing) but for some reason my brain 
wasn't connecting that part when thinking about race conditions and concurrency 
issues. Obviously if the tablet is not hosted and if only the Manager is doing 
things it is safe since there's a write lock and it's the only process doing 
things.
   
   For elasticity that will be interesting since we should be able to merge 
offline tablets that are on demand (currently you can't merge offline tables). 
And there will be multiple managers, etc so we will have to see how it works 
out I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to