[ 
https://issues.apache.org/jira/browse/ACCUMULO-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corey J. Nolet updated ACCUMULO-2964:
-------------------------------------
    Fix Version/s:     (was: 1.6.2)
                   1.6.3

> Unexpected ThriftSecurityException from BatchScanner
> ----------------------------------------------------
>
>                 Key: ACCUMULO-2964
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2964
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: Josh Elser
>             Fix For: 1.7.0, 1.6.3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is something I've only seen a handful of times when writing/running 
> tests that stop and restart tservers. After the tserver is restarted, there 
> is a thread (typically running in the master) which is trying to read a 
> table. As such, the thread will continue to poll until the tserver comes up.
> Very infrequently, the client gets a {{ThriftSecurityException}} with a code 
> of {{DEFAULT_SECURITY_ERROR}} and a message of {{Unknown security 
> exception}}. There is no additional information in the client log (from the 
> thrift call inside the batchscanner), and the tserver contains no error 
> messages at all.
> The error that the client saw.
> {noformat}
> 2014-07-01 04:18:18,971 [impl.TabletServerBatchReaderIterator] DEBUG: Server 
> : host:58090 msg : null
> ThriftSecurityException(user:!SYSTEM, code:null)
>         at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10045)
>         at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10022)
>         at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result.read(TabletClientService.java:9961)
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>         at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:313)
>         at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:293)
>         at 
> org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:632)
>         at 
> org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:592)
>         at 
> org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablets(MetadataLocationObtainer.java:181)
>         at 
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:667)
>         at 
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:337)
>         at 
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:660)
>         at 
> org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:610)
>         at 
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:440)
>         at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:226)
>         at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
>         at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
>         at 
> org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.createWork(DistributedWorkQueueWorkAssigner.java:161)
>         at 
> org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.assignWork(DistributedWorkQueueWorkAssigner.java:140)
>         at 
> org.apache.accumulo.master.replication.WorkDriver.run(WorkDriver.java:97)
> {noformat}
> The interesting part is that when the client saw this message, the new 
> TabletServer was already started, and the old tabletserver appears to have 
> been dead for 20s. So, the client in the master had been polling for 20s 
> getting a ConnectException (connection refused) which is expected. I don't 
> know why we got this exception after a length of time.
> The infrequency in which I see this makes me wonder if the random ports in 
> the new tabletserver are somehow re-grabbing the old tserver's thrift client 
> service port and something is unexpectedly being interpreted as this 
> ThriftSecurityException? That's the only thing that seems remotely possible 
> to me. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to