ctubbsii commented on pull request #1653: URL: https://github.com/apache/accumulo/pull/1653#issuecomment-691274564
Okay, I was trying to track down this a bit, because it didn't make sense to me that the two methods that previously updated the `LAST` location would be doing that. I traced the code and saw that the two methods in `MasterMetadataUtil` that update `LocationType.LAST` are only ever called by `DatafileManager.bringMinorCompactionOnline` and `DatafileManager.bringMajorCompactionOnline`. So, it seems that the original intent (prior to #1616) was to *only* update the `LAST` location if a file was successfully written locally on that server. So, `LAST` does *not* represent the last assignment, but the last server where data was written from a major or minor compaction. The intent, it seems, is that the initial assignments on startup should be based on where the data would be local, and not to preserve its previous assignment. This insight also answers the questions I had in https://github.com/apache/accumulo/pull/1616#issuecomment-665412742 about the intent of the `LAST` field. Because, if the intent was to simply assign to the last location it was previously assigned to, then assigning it to the last `CURRENT` location that was written would be sufficient (because that is updated on assignment), and we wouldn't even need a `LAST` location at all. `LAST` does not mean "last assignment", it means "last location where data was likely written locally". This is about data locality, not (re)assignment thrashing. So, we need to go back to the drawing board here, because I'm not sure #1616 even fixed a problem at all... it simply changed behavior to re-assign to a previously assigned location, rather than the last location written to. We need to figure out if the code works as intended *without* the changes in #1616. If it does, then we need to figure out whether we want to change the behavior or not. If we do, then we can probably change the code to eliminate the `LAST` field entirely, and just use the `CURRENT` field, if it exists. If we don't want to change the behavior, then we can revert #1616 and simply do the changes that get rid of some boilerplate by using Ample. Does all this make sense, @Manno15 If not, feel free to give me a call or ping me on Slack. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
