dlmarion commented on issue #4232: URL: https://github.com/apache/accumulo/issues/4232#issuecomment-1930431271
> an alternative to moving log sort to its own process would be to make it resource group aware. Processes currently know their resource group, but group name for tables is not a first-class citizen. There is a custom property in TableLoadBalancer (`table.custom.assignment.group`) that the balancer uses. Nothing else uses this as far as I know. >So make log sort happen within a specific resource group that makes sense for the particular WAL. If we made the property a first-class citizen of the table configuration, then I think there are cases where recovery may never happen. Consider the case of a user deciding to change the way they access a table from IMMEDIATE to EVENTUAL. They shut down all the tablet servers and decide to bulk load and use scan servers. Some of the TabletServers don't stop cleanly so those tablets have walogs that may not get sorted and recovered. I have been thinking of a different approach because I think there is a danger of recovery not occurring. We made changes in #4229 to host a tablet regardless of availability for the purposes of recovery. We did this because if recovery does not happen, then ScanServers will not return the data in the walogs, the Compactor won't compact the data in the walogs, and the GC process will never delete the walog files. My proposal would be to embed the LogSorter in the Compactor and ScanServer in addition to the it's current location in the TabletServer. I think the Compactor processes should look for log sorting work before it starts a new compaction, and I think we setup the ScanServer to use the LogSorter just like the TabletServer does. The idea being that resource groups allow the user to dedicate resources to tables for specific things, but when a failure happens and recovery needs to occur all members participate for the accuracy of the data and safety to the system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
