[ 
https://issues.apache.org/jira/browse/ACCUMULO-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140724#comment-14140724
 ] 

Eric Newton commented on ACCUMULO-3148:
---------------------------------------

You should see "unable to get tablet server status" 3x, and then it will ask 
the tserver to halt.
You should see "attempting to stop ".
After that is attempted, the lock is removed.
It should say "Removing zookeeper lock for " before it does so.

This test never fails for me.  What's the environment?


> TabletServer didn't get Session expired in HalfDeadTServerIT
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3148
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.1, 1.7.0
>
>
> Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get 
> the ZK session expiration
> {noformat}
> 2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid 
> 172.31.33.94:35957 !0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1] 
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> 2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock 
> (reason = LOCK_DELETED), exiting.
> 2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will 
> retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>       at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
>       at 
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
>       at 
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
>       at 
> org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
>       at 
> org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
>       at 
> org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
>       at 
> org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
>       at 
> org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
>       at 
> org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
>       at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
>       at 
> org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:745)
> 2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC 
> pauses not called in a timely fashion. Expected every 5.0 seconds but was 
> 16.3 seconds since last check
> 2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 
> 127.0.0.1:57185:DataXceiver error processing WRITE_BLOCK operation  src: 
> /127.0.0.1:42146 dst: /127.0.0.1:57185
> java.io.IOException: Premature EOF from inputStream
>       at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It looks like the tserver killed itself after the connection loss but before 
> the tserver retried to connect and got the session expiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to