[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218785#comment-13218785 ]
stack commented on HBASE-5487: ------------------------------ I took a look at FATE over in accumulo. Its some nice generic primitives for running a suite of idempotent operations (even if operation only part completes, if its run again, it should clean up and continue). There is a notion of locking on a table (so can stop it transiting I suppose; there are read/write locks), a stack for operations (ops are pushed and popped off the stack), operations can respond done, failed, or even w/ a new set of operations to do first (This basic can be used to step through a number of tasks one after the other). All is persisted up in zk run by the master; if master dies, a new master can pick up the half-done task and finish it. Clients can watch zk to see if task is done. There ain't too much to the fate package; there is fate class itself, an admin, a 'store' interface of which there is a zk implementation. We should for sure take inspiration at least from the work already done. Here are the ops they do via fate: {code} fate.seedTransaction(opid, new TraceRepo<Master>(new CreateTable(c.user, tableName, timeType, options)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new RenameTable(tableId, oldTableName, newTableName)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new CloneTable(c.user, srcTableId, tableName, propertiesToSet, propertiesToExclude)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new DeleteTable(tableId)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new ChangeTableState(tableId, TableOperation.ONLINE)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new ChangeTableState(tableId, TableOperation.OFFLINE)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new TableRangeOp(MergeInfo.Operation.MERGE, tableId, startRow, endRow)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new TableRangeOp(MergeInfo.Operation.DELETE, tableId, startRow, endRow)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new BulkImport(tableId, dir, failDir, setTime)), autoCleanup); fate.seedTransaction(opid, new TraceRepo<Master>(new CompactRange(tableId, startRow, endRow)), autoCleanup);{code} {code} CompactRange is their term for merge. It takes a key range span, figures the tablets involved and runs the compact/merge. We want that and then something to do the remove or regions too? > Generic framework for Master-coordinated tasks > ---------------------------------------------- > > Key: HBASE-5487 > URL: https://issues.apache.org/jira/browse/HBASE-5487 > Project: HBase > Issue Type: New Feature > Components: master, regionserver, zookeeper > Affects Versions: 0.94.0 > Reporter: Mubarak Seyed > Labels: noob > > Need a framework to execute master-coordinated tasks in a fault-tolerant > manner. > Master-coordinated tasks such as online-scheme change and delete-range > (deleting region(s) based on start/end key) can make use of this framework. > The advantages of framework are > 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for > master-coordinated tasks > 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK > 3. Easy to plugin new master-coordinated tasks without adding code to core > components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira