Josh Elser created ACCUMULO-3148:
------------------------------------
Summary: TabletServer didn't get Session expired in
HalfDeadTServerIT
Key: ACCUMULO-3148
URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
Project: Accumulo
Issue Type: Bug
Components: test
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 1.6.1, 1.7.0
Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get
the ZK session expiration
{noformat}
2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid
172.31.33.94:35957 !0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1]
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
sleeping
2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock
(reason = LOCK_DELETED), exiting.
2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for
/accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
at
org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
at
org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
at
org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
at
org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
at
org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
at
org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
at
org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
at
org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
at
org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at
org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:745)
2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC pauses
not called in a timely fashion. Expected every 5.0 seconds but was 16.3 seconds
since last check
2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 127.0.0.1:57185:DataXceiver
error processing WRITE_BLOCK operation src: /127.0.0.1:42146 dst:
/127.0.0.1:57185
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:745)
{noformat}
It looks like the tserver killed itself after the connection loss but before
the tserver retried to connect and got the session expiration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)