[
https://issues.apache.org/jira/browse/ACCUMULO-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Busbey updated ACCUMULO-2224:
----------------------------------
Priority: Minor (was: Major)
AFAICT, ZK will throw as soon as any of the specified hostnames in the connect
string resolves as UnknownHostException.
The workaround for existing releases is to fix the underlying DNS problem and
then restart roles.
Some stack traces of where this came up during testing (for those wishing to
dedup errors they might see)
tserver compaction
{noformat}
Unexpected exception in Split/MajC initiator
java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper1.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:36)
at
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:40)
at
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
at
org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
at
org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
at
org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
at
org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:121)
at
org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:109)
at
org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:47)
at
org.apache.accumulo.server.tabletserver.Tablet.findSplitRow(Tablet.java:3028)
at
org.apache.accumulo.server.tabletserver.Tablet.needsSplit(Tablet.java:3122)
at
org.apache.accumulo.server.tabletserver.TabletServer$MajorCompactor.run(TabletServer.java:2117)
at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: zookeeper1.example.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1157)
at java.net.InetAddress.getAllByName(InetAddress.java:1083)
at java.net.InetAddress.getAllByName(InetAddress.java:1019)
at
org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:77)
... 15 m
{noformat}
logger tracing
{noformat}
2014-01-16 00:00:12,772 [zookeeper.ZooSession] WARN :
java.net.UnknownHostException : zookeeper2.example.com
2014-01-16 00:00:12,772 [trace.ZooTraceClient] ERROR: unable to get destination
hosts in zookeeper
java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper2.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
at
org.apache.accumulo.server.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:57)
at
org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:66)
at
org.apache.accumulo.core.trace.ZooTraceClient.process(ZooTraceClient.java:64)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
{noformat}
master startup (I think):
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper1.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
at
org.apache.accumulo.server.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:57)
at
org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:61)
at
org.apache.accumulo.server.Accumulo.waitForZookeeperAndHdfs(Accumulo.java:201)
at
org.apache.accumulo.server.master.state.SetGoalState.main(SetGoalState.java:40)
{noformat}
Continuous Ingest stats collector
{noformat}
1389860157417 Failed to collect stats : java.net.UnknownHostException:
zookeeper1.example.com
java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper1.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
at
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
at
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
at
org.apache.accumulo.core.zookeeper.ZooCache.getChildren(ZooCache.java:178)
at
org.apache.accumulo.server.zookeeper.ZooLock.getLockData(ZooLock.java:414)
at
org.apache.accumulo.server.client.HdfsZooInstance.getMasterLocations(HdfsZooInstance.java:102)
at
org.apache.accumulo.core.client.impl.MasterClient.getConnection(MasterClient.java:52)
at
org.apache.accumulo.core.client.impl.MasterClient.getConnectionWithRetry(MasterClient.java:43)
at
org.apache.accumulo.server.test.continuous.ContinuousStatsCollector$StatsCollectionTask.getACUStats(ContinuousStatsCollector.java:128)
at
org.apache.accumulo.server.test.continuous.ContinuousStatsCollector$StatsCollectionTask.run(ContinuousStatsCollector.java:77)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
{noformat}
Continuous Ingest scanner (probably all BatchScanners)
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper1.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
at
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
at
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
at
org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:148)
at
org.apache.accumulo.core.client.impl.TabletLocator.getInstance(TabletLocator.java:96)
at
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:245)
at
org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:94)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{noformat}
Continous Ingest writer (probably all users of BatchWriter):
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException:
zookeeper1.example.com
at
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
at
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
at
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
at
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
at
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
at
org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:148)
at
org.apache.accumulo.core.client.impl.TabletLocator.getInstance(TabletLocator.java:96)
at
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:733)
at
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:671)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{noformat}
> ZooSession should be more robust to transient DNS issues
> --------------------------------------------------------
>
> Key: ACCUMULO-2224
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2224
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0
> Environment: 1.4.5-SNAP on CDH4 w/gremlins
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Priority: Minor
> Fix For: 1.4.5, 1.5.1, 1.6.0
>
>
> While injecting network faults, I found that transient DNS problems caused us
> to bail out of ZooSessions rather than retrying as we do for all other IO
> problems. We should retry these failures just as we do for Connection Refused
> or other networking problems.
> Since the addition of ACCUMULO-131, we can be sure that we won't retry actual
> invalid hosts for ever. Instead, after the time out period that holds for all
> other problems we'll properly exit.
> The warn messages logged for IOExceptions should suffice to indicate
> improperly specified host names.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)