dlmarion commented on issue #4232:
URL: https://github.com/apache/accumulo/issues/4232#issuecomment-1930431271

   > an alternative to moving log sort to its own process would be to make it 
resource group aware. 
   
   Processes currently know their resource group, but group name for tables is 
not a first-class citizen. There is a custom property in TableLoadBalancer 
(`table.custom.assignment.group`) that the balancer uses. Nothing else uses 
this as far as I know. 
   
   >So make log sort happen within a specific resource group that makes sense 
for the particular WAL.
   
   If we made the property a first-class citizen of the table configuration, 
then I think there are cases where recovery may never happen. Consider the case 
of a user deciding to change the way they access a table from IMMEDIATE to 
EVENTUAL. They shut down all the tablet servers and decide to bulk load and use 
scan servers. Some of the TabletServers don't stop cleanly so those tablets 
have walogs that may not get sorted and recovered.
   
   I have been thinking of a different approach because I think there is a 
danger of recovery not occurring. We made changes in #4229 to host a tablet 
regardless of availability for the purposes of recovery. We did this because if 
recovery does not happen, then ScanServers will not return the data in the 
walogs, the Compactor won't compact the data in the walogs, and the GC process 
will never delete the walog files. My proposal would be to embed the LogSorter 
in the Compactor and ScanServer in addition to the it's current location in the 
TabletServer. I think the Compactor processes should look for log sorting work 
before it starts a new compaction, and I think we setup the ScanServer to use 
the LogSorter just like the TabletServer does. The idea being that resource 
groups allow the user to dedicate resources to tables for specific things, but 
when a failure happens and recovery needs to occur all members participate for 
the accuracy of the data and safety to the system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to