[
https://issues.apache.org/jira/browse/ACCUMULO-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Corey J. Nolet updated ACCUMULO-2964:
-------------------------------------
Fix Version/s: (was: 1.6.2)
1.6.3
> Unexpected ThriftSecurityException from BatchScanner
> ----------------------------------------------------
>
> Key: ACCUMULO-2964
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2964
> Project: Accumulo
> Issue Type: Bug
> Components: client, tserver
> Reporter: Josh Elser
> Fix For: 1.7.0, 1.6.3
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This is something I've only seen a handful of times when writing/running
> tests that stop and restart tservers. After the tserver is restarted, there
> is a thread (typically running in the master) which is trying to read a
> table. As such, the thread will continue to poll until the tserver comes up.
> Very infrequently, the client gets a {{ThriftSecurityException}} with a code
> of {{DEFAULT_SECURITY_ERROR}} and a message of {{Unknown security
> exception}}. There is no additional information in the client log (from the
> thrift call inside the batchscanner), and the tserver contains no error
> messages at all.
> The error that the client saw.
> {noformat}
> 2014-07-01 04:18:18,971 [impl.TabletServerBatchReaderIterator] DEBUG: Server
> : host:58090 msg : null
> ThriftSecurityException(user:!SYSTEM, code:null)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10045)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10022)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result.read(TabletClientService.java:9961)
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:313)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:293)
> at
> org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:632)
> at
> org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:592)
> at
> org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablets(MetadataLocationObtainer.java:181)
> at
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:667)
> at
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:337)
> at
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:660)
> at
> org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:610)
> at
> org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:440)
> at
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:226)
> at
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
> at
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
> at
> org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.createWork(DistributedWorkQueueWorkAssigner.java:161)
> at
> org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.assignWork(DistributedWorkQueueWorkAssigner.java:140)
> at
> org.apache.accumulo.master.replication.WorkDriver.run(WorkDriver.java:97)
> {noformat}
> The interesting part is that when the client saw this message, the new
> TabletServer was already started, and the old tabletserver appears to have
> been dead for 20s. So, the client in the master had been polling for 20s
> getting a ConnectException (connection refused) which is expected. I don't
> know why we got this exception after a length of time.
> The infrequency in which I see this makes me wonder if the random ports in
> the new tabletserver are somehow re-grabbing the old tserver's thrift client
> service port and something is unexpectedly being interpreted as this
> ThriftSecurityException? That's the only thing that seems remotely possible
> to me.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)