Josh Elser created ACCUMULO-2964:
------------------------------------

             Summary: Unexpected ThriftSecurityException from BatchScanner
                 Key: ACCUMULO-2964
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2964
             Project: Accumulo
          Issue Type: Bug
          Components: client, tserver
            Reporter: Josh Elser
            Priority: Minor
             Fix For: 1.7.0


This is something I've only seen a handful of times when writing/running tests 
that stop and restart tservers. After the tserver is restarted, there is a 
thread (typically running in the master) which is trying to read a table. As 
such, the thread will continue to poll until the tserver comes up.

Very infrequently, the client gets a {{ThriftSecurityException}} with a code of 
{{DEFAULT_SECURITY_ERROR}} and a message of {{Unknown security exception}}. 
There is no additional information in the client log (from the thrift call 
inside the batchscanner), and the tserver contains no error messages at all.

The error that the client saw.

{noformat}
2014-07-01 04:18:18,971 [impl.TabletServerBatchReaderIterator] DEBUG: Server : 
host:58090 msg : null
ThriftSecurityException(user:!SYSTEM, code:null)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10045)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10022)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result.read(TabletClientService.java:9961)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:313)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:293)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:632)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:592)
        at 
org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablets(MetadataLocationObtainer.java:181)
        at 
org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:667)
        at 
org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:337)
        at 
org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:660)
        at 
org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:610)
        at 
org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:440)
        at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:226)
        at 
org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
        at 
org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
        at 
org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.createWork(DistributedWorkQueueWorkAssigner.java:161)
        at 
org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.assignWork(DistributedWorkQueueWorkAssigner.java:140)
        at 
org.apache.accumulo.master.replication.WorkDriver.run(WorkDriver.java:97)
{noformat}

The interesting part is that when the client saw this message, the new 
TabletServer was already started, and the old tabletserver appears to have been 
dead for 20s. So, the client in the master had been polling for 20s getting a 
ConnectException (connection refused) which is expected. I don't know why we 
got this exception after a length of time.

The infrequency in which I see this makes me wonder if the random ports in the 
new tabletserver are somehow re-grabbing the old tserver's thrift client 
service port and something is unexpectedly being interpreted as this 
ThriftSecurityException? That's the only thing that seems remotely possible to 
me. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to