[ https://issues.apache.org/jira/browse/ACCUMULO-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner resolved ACCUMULO-436. ----------------------------------- Resolution: Fixed > tablet merge stuck > ------------------ > > Key: ACCUMULO-436 > URL: https://issues.apache.org/jira/browse/ACCUMULO-436 > Project: Accumulo > Issue Type: Bug > Components: master > Environment: randomwalk with agitation on 10-node test cluster > Reporter: Eric Newton > Assignee: Eric Newton > Labels: 14_qa_bug > Fix For: 1.4.0 > > > After 14 hours of randomwalk, a merge operation appeared to be stuck. > Garbage collector was stuck, some tablets were offline: > ||\# Online Tablet Servers|| \# Total Tablet Servers|| Loggers Last GC > ||\# Tablets ||\# Unassigned||Tablets||Entries||Ingest||Query||Hold > Time||OS Load|| > |10 |10 |10 |*Running 2/29/12 12:14 PM* |299 |*4* > |277.50M |311 |5.53K |— |0.50| > Garbage collector could not get a consistent !METADATA table scan: > {noformat} > 29 13:04:10,808 [util.TabletIterator] INFO : Resetting !METADATA scanner to > [24q;5f83b8f927c41c9d%00; : [] 9223372036854775807 false,~ : [] > 9223372036854775807 false) > 29 13:04:11,071 [util.TabletIterator] INFO : Metadata inconsistency : > 1419e44259517c51 != 5f83b8f927c41c9d metadataKey = 24q< ~tab:~pr [] 724883 > false > {noformat} > Table (id 24q) had a merge in progress: > {noformat} > ./bin/accumulo org.apache.accumulo.server.fate.Admin print > txid: 7bea12fa46c40a72 status: IN_PROGRESS op: BulkImport > locked: [] locking: [R:24q] top: BulkImport > txid: 08db6105a25c0788 status: IN_PROGRESS op: CloneTable > locked: [] locking: [R:24q] top: CloneTable > txid: 5f798db1cab5fdea status: IN_PROGRESS op: BulkImport > locked: [] locking: [R:24q] top: BulkImport > txid: 6aa9a8a9b36a4f4d status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:24q] top: TableRangeOp > txid: 5c6e82e235ec3855 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:24q] top: TableRangeOp > txid: 653a9293ba9f1cdc status: IN_PROGRESS op: RenameTable > locked: [] locking: [W:24q] top: RenameTable > txid: 651c62eb37136b6e status: IN_PROGRESS op: TableRangeOp > locked: [W:24q] locking: [] top: TableRangeOpWait > {noformat} > Scan of table 24q: > {noformat} > scan -b 24q; -e 24q< > 24q;073b220b74a75059 loc:135396fb191d4b6 [] 192.168.117.6:9997 > 24q;073b220b74a75059 srv:compact [] 3 > 24q;073b220b74a75059 srv:dir [] /t-00031y0 > 24q;073b220b74a75059 srv:lock [] > tservers/192.168.117.7:9997/zlock-0000000002$3353986642ea7f3 > 24q;073b220b74a75059 srv:time [] M0 > 24q;073b220b74a75059 ~tab:~pr [] \x00 > 24q;1419e44259517c51 loc:235396fb184b5cd [] 192.168.117.12:9997 > 24q;1419e44259517c51 srv:compact [] 3 > 24q;1419e44259517c51 srv:dir [] /t-00031y1 > 24q;1419e44259517c51 srv:lock [] > tservers/192.168.117.7:9997/zlock-0000000002$3353986642ea7f3 > 24q;1419e44259517c51 srv:time [] M0 > 24q;1419e44259517c51 ~tab:~pr [] \x01073b220b74a75059 > 24q;51fc3e7faea2b7e9 chopped:chopped [] chopped > 24q;51fc3e7faea2b7e9 srv:compact [] 3 > 24q;51fc3e7faea2b7e9 srv:dir [] /t-00031y2 > 24q;51fc3e7faea2b7e9 srv:lock [] > tservers/192.168.117.7:9997/zlock-0000000002$3353986642ea7f3 > 24q;51fc3e7faea2b7e9 srv:time [] M0 > 24q;51fc3e7faea2b7e9 ~tab:~pr [] \x011419e44259517c51 > 24q;5e65b844f2c7f868 chopped:chopped [] chopped > 24q;5e65b844f2c7f868 srv:compact [] 3 > 24q;5e65b844f2c7f868 srv:dir [] /t-00031e1 > 24q;5e65b844f2c7f868 srv:lock [] > tservers/192.168.117.7:9997/zlock-0000000002$3353986642ea7f3 > 24q;5e65b844f2c7f868 srv:time [] M0 > 24q;5e65b844f2c7f868 ~tab:~pr [] \x0151fc3e7faea2b7e9 > 24q;5f83b8f927c41c9d chopped:chopped [] chopped > 24q;5f83b8f927c41c9d srv:compact [] 3 > 24q;5f83b8f927c41c9d srv:dir [] /t-000329w > 24q;5f83b8f927c41c9d srv:lock [] > tservers/192.168.117.6:9997/zlock-0000000002$135396fb191c4f3 > 24q;5f83b8f927c41c9d srv:time [] M0 > 24q;5f83b8f927c41c9d ~tab:~pr [] \x015e65b844f2c7f868 > 24q< chopped:chopped [] chopped > 24q< srv:compact [] 3 > 24q< srv:dir [] /default_tablet > 24q< srv:lock [] > tservers/192.168.117.6:9997/zlock-0000000002$135396fb191c4f3 > 24q< srv:time [] M0 > 24q< ~tab:~pr [] \x011419e44259517c51 > {noformat} > Master Logs > {noformat} > 29 13:11:49,903 [state.MergeStats] INFO : Computing next merge state for > 24q;6badf28df1d8ece7;37f3488aa92ac056 which is presently MERGING isDelete : > false > 29 13:11:49,903 [state.MergeStats] INFO : 4 tablets are unassigned > 24q;6badf28df1d8ece7;37f3488aa92ac056 > {noformat} > The final consistency check is failing because the merge is partially > complete. The final step is not "adampotent" enough: partial execution > leaves the Repo in a state in which it cannot continue after restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira