[ 
https://issues.apache.org/jira/browse/ACCUMULO-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140640#comment-14140640
 ] 

Eric Newton commented on ACCUMULO-3148:
---------------------------------------

I think the master is killing the server because it is unresponsive.  I would 
verify that by looking in the master's log.

If the test didn't set the general.rpc.timeout, the master wouldn't interfere.

> TabletServer didn't get Session expired in HalfDeadTServerIT
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3148
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.1, 1.7.0
>
>
> Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get 
> the ZK session expiration
> {noformat}
> 2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid 
> 172.31.33.94:35957 !0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1] 
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> 2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock 
> (reason = LOCK_DELETED), exiting.
> 2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will 
> retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>       at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
>       at 
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
>       at 
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
>       at 
> org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
>       at 
> org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
>       at 
> org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
>       at 
> org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
>       at 
> org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
>       at 
> org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
>       at 
> org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> 2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC 
> pauses not called in a timely fashion. Expected every 5.0 seconds but was 
> 16.3 seconds since last check
> 2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 
> 127.0.0.1:57185:DataXceiver error processing WRITE_BLOCK operation  src: 
> /127.0.0.1:42146 dst: /127.0.0.1:57185
> java.io.IOException: Premature EOF from inputStream
>       at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It looks like the tserver killed itself after the connection loss but before 
> the tserver retried to connect and got the session expiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to