[
https://issues.apache.org/jira/browse/ACCUMULO-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141459#comment-14141459
]
Josh Elser commented on ACCUMULO-3148:
--------------------------------------
Thanks for your help throughout this Eric. We concluded that the system was
operating as intended, and that the verification that the test case was making
was invalid. As long as the tabletserver dies in testTimeout, the test should
pass.
> TabletServer didn't get Session expired in HalfDeadTServerIT
> ------------------------------------------------------------
>
> Key: ACCUMULO-3148
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
> Project: Accumulo
> Issue Type: Bug
> Components: test
> Reporter: Josh Elser
> Assignee: Josh Elser
> Fix For: 1.6.1, 1.7.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get
> the ZK session expiration
> {noformat}
> 2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid
> 172.31.33.94:35957 !0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1]
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> 2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock
> (reason = LOCK_DELETED), exiting.
> 2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will
> retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for
> /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
> at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
> at
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
> at
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
> at
> org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
> at
> org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
> at
> org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
> at
> org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
> at
> org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
> at
> org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
> at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
> at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
> at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
> at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
> at
> org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
> at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:745)
> 2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC
> pauses not called in a timely fashion. Expected every 5.0 seconds but was
> 16.3 seconds since last check
> 2014-09-15 09:40:20,477 [datanode.DataNode] ERROR:
> 127.0.0.1:57185:DataXceiver error processing WRITE_BLOCK operation src:
> /127.0.0.1:42146 dst: /127.0.0.1:57185
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It looks like the tserver killed itself after the connection loss but before
> the tserver retried to connect and got the session expiration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)