Hole in metadata table occurred during random walk test
-------------------------------------------------------
Key: ACCUMULO-315
URL: https://issues.apache.org/jira/browse/ACCUMULO-315
Project: Accumulo
Issue Type: Bug
Components: master, tserver
Environment: Running 1.4.0 SNAPSHOT on 10 node cluster.
Reporter: Keith Turner
Assignee: Keith Turner
Priority: Critical
Fix For: 1.4.0
While running the random walk test a hole in the metadata table occurred. A
client tried to delete the table with the whole and the fate op got stuck. Was
continually seeing the following in the master logs.
{noformat}
14 00:02:11,273 [tableOps.CleanUp] DEBUG: Still waiting for table to be
deleted: 4ct locationState:
4ct;4d2d3be2823b0bf4;27b693c626c2d4ef@(null,xxx.xxx.xxx.xxx:9997[134d7425fc503e1],null)
{noformat}
The metadata table contained the following. Tablet 4ct;4d2d3be2823b0bf4 had a
location.
{noformat}
4ct;262249211a62cd6f ~tab:~pr [] \x011819e56edae21302
4ct;27b693c626c2d4ef ~tab:~pr [] \x01262249211a62cd6f
4ct;43422047c78fa52b ~tab:~pr [] \x0141ea825af0f262d9
4ct;4d2d3be2823b0bf4 ~tab:~pr [] \x0127b693c626c2d4ef
4ct;4f89df61392bb311 ~tab:~pr [] \x014d2d3be2823b0bf4
{noformat}
Found the following events on a tablet server.
{noformat}
21:36:04,369 [tabletserver.Tablet] TABLET_HIST:
4ct;4d2d3be2823b0bf4;27b693c626c2d4ef split
4ct;41ea825af0f262d9;27b693c626c2d4ef 4ct;4d2d3be2823b0bf4;41ea825af0f262d9
21:36:06,351 [tabletserver.Tablet] TABLET_HIST:
4ct;4d2d3be2823b0bf4;41ea825af0f262d9 split
4ct;43422047c78fa52b;41ea825af0f262d9 4ct;4d2d3be2823b0bf4;43422047c78fa52b
{noformat}
Saw the following on the tablet server serving the metadata tablet at around
the time of the splits. Not sure if this is related.
{noformat}
13 21:36:10,956 [server.TNonblockingServer] WARN : Got an IOException in
internalRead!
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
at
org.apache.thrift.server.TNonblockingServer$FrameBuffer.internalRead(TNonblockingServer.java:668)
at
org.apache.thrift.server.TNonblockingServer$FrameBuffer.read(TNonblockingServer.java:457)
at
org.apache.thrift.server.TNonblockingServer$SelectThread.handleRead(TNonblockingServer.java:358)
at
org.apache.thrift.server.TNonblockingServer$SelectThread.select(TNonblockingServer.java:303)
at
org.apache.thrift.server.TNonblockingServer$SelectThread.run(TNonblockingServer.java:242)
{noformat}
Not sure what caused the metadata problem. Further investigation is needed.
Also, while debugging the master started assigning and unassigning metadata
tablets rapidly. Did not get a change to investigate this, it stopped when I
stopped the random walk test.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira