ctubbsii commented on pull request #1653:
URL: https://github.com/apache/accumulo/pull/1653#issuecomment-691274564


   Okay, I was trying to track down this a bit, because it didn't make sense to 
me that the two methods that previously updated the `LAST` location would be 
doing that. I traced the code and saw that the two methods in 
`MasterMetadataUtil` that update `LocationType.LAST` are only ever called by 
`DatafileManager.bringMinorCompactionOnline` and 
`DatafileManager.bringMajorCompactionOnline`.
   
   So, it seems that the original intent (prior to #1616) was to *only* update 
the `LAST` location if a file was successfully written locally on that server. 
So, `LAST` does *not* represent the last assignment, but the last server where 
data was written from a major or minor compaction. The intent, it seems, is 
that the initial assignments on startup should be based on where the data would 
be local, and not to preserve its previous assignment.
   
   This insight also answers the questions I had in 
https://github.com/apache/accumulo/pull/1616#issuecomment-665412742 about the 
intent of the `LAST` field. Because, if the intent was to simply assign to the 
last location it was previously assigned to, then assigning it to the last 
`CURRENT` location that was written would be sufficient (because that is 
updated on assignment), and we wouldn't even need a `LAST` location at all. 
`LAST` does not mean "last assignment", it means "last location where data was 
likely written locally". This is about data locality, not (re)assignment 
thrashing.
   
   So, we need to go back to the drawing board here, because I'm not sure #1616 
even fixed a problem at all... it simply changed behavior to re-assign to a 
previously assigned location, rather than the last location written to. We need 
to figure out if the code works as intended *without* the changes in #1616. If 
it does, then we need to figure out whether we want to change the behavior or 
not. If we do, then we can probably change the code to eliminate the `LAST` 
field entirely, and just use the `CURRENT` field, if it exists. If we don't 
want to change the behavior, then we can revert #1616 and simply do the changes 
that get rid of some boilerplate by using Ample.
   
   Does all this make sense, @Manno15 
   If not, feel free to give me a call or ping me on Slack.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to