[ 
https://issues.apache.org/jira/browse/ACCUMULO-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-2224:
----------------------------------

    Priority: Minor  (was: Major)

AFAICT, ZK will throw as soon as any of the specified hostnames in the connect 
string resolves as UnknownHostException.

The workaround for existing releases is to fix the underlying DNS problem and 
then restart roles.

Some stack traces of where this came up during testing (for those wishing to 
dedup errors they might see)

tserver compaction
{noformat}
Unexpected exception in Split/MajC initiator
        java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper1.example.com
                at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
                at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
                at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:36)
                at 
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:40)
                at 
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
                at 
org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
                at 
org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
                at 
org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
                at 
org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:121)
                at 
org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:109)
                at 
org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:47)
                at 
org.apache.accumulo.server.tabletserver.Tablet.findSplitRow(Tablet.java:3028)
                at 
org.apache.accumulo.server.tabletserver.Tablet.needsSplit(Tablet.java:3122)
                at 
org.apache.accumulo.server.tabletserver.TabletServer$MajorCompactor.run(TabletServer.java:2117)
                at 
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.net.UnknownHostException: zookeeper1.example.com
                at java.net.InetAddress.getAllByName0(InetAddress.java:1157)
                at java.net.InetAddress.getAllByName(InetAddress.java:1083)
                at java.net.InetAddress.getAllByName(InetAddress.java:1019)
                at 
org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
                at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
                at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
                at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:77)
                ... 15 m
{noformat}

logger tracing
{noformat}
2014-01-16 00:00:12,772 [zookeeper.ZooSession] WARN : 
java.net.UnknownHostException : zookeeper2.example.com
2014-01-16 00:00:12,772 [trace.ZooTraceClient] ERROR: unable to get destination 
hosts in zookeeper
java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper2.example.com
        at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
        at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
        at 
org.apache.accumulo.server.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:57)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:66)
        at 
org.apache.accumulo.core.trace.ZooTraceClient.process(ZooTraceClient.java:64)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
{noformat}

master startup (I think):
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper1.example.com
        at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
        at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
        at 
org.apache.accumulo.server.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:57)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getChildren(ZooReader.java:61)
        at 
org.apache.accumulo.server.Accumulo.waitForZookeeperAndHdfs(Accumulo.java:201)
        at 
org.apache.accumulo.server.master.state.SetGoalState.main(SetGoalState.java:40)

{noformat}

Continuous Ingest stats collector
{noformat}
1389860157417 Failed to collect stats : java.net.UnknownHostException: 
zookeeper1.example.com
java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper1.example.com
        at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
        at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
        at 
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
        at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
        at 
org.apache.accumulo.core.zookeeper.ZooCache.getChildren(ZooCache.java:178)
        at 
org.apache.accumulo.server.zookeeper.ZooLock.getLockData(ZooLock.java:414)
        at 
org.apache.accumulo.server.client.HdfsZooInstance.getMasterLocations(HdfsZooInstance.java:102)
        at 
org.apache.accumulo.core.client.impl.MasterClient.getConnection(MasterClient.java:52)
        at 
org.apache.accumulo.core.client.impl.MasterClient.getConnectionWithRetry(MasterClient.java:43)
        at 
org.apache.accumulo.server.test.continuous.ContinuousStatsCollector$StatsCollectionTask.getACUStats(ContinuousStatsCollector.java:128)
        at 
org.apache.accumulo.server.test.continuous.ContinuousStatsCollector$StatsCollectionTask.run(ContinuousStatsCollector.java:77)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
{noformat}

Continuous Ingest scanner (probably all BatchScanners)
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper1.example.com
        at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
        at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
        at 
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
        at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
        at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
        at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
        at 
org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:148)
        at 
org.apache.accumulo.core.client.impl.TabletLocator.getInstance(TabletLocator.java:96)
        at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:245)
        at 
org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:94)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{noformat}

Continous Ingest writer (probably all users of BatchWriter):
{noformat}
Caused by: java.lang.RuntimeException: java.net.UnknownHostException: 
zookeeper1.example.com
        at 
org.apache.accumulo.core.zookeeper.ZooSession.connect(ZooSession.java:94)
        at 
org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:142)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37)
        at 
org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41)
        at 
org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56)
        at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127)
        at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233)
        at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188)
        at 
org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:148)
        at 
org.apache.accumulo.core.client.impl.TabletLocator.getInstance(TabletLocator.java:96)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:733)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:671)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{noformat}

> ZooSession should be more robust to transient DNS issues
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2224
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2224
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0
>         Environment: 1.4.5-SNAP on CDH4 w/gremlins
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Minor
>             Fix For: 1.4.5, 1.5.1, 1.6.0
>
>
> While injecting network faults, I found that transient DNS problems caused us 
> to bail out of ZooSessions rather than retrying as we do for all other IO 
> problems. We should retry these failures just as we do for Connection Refused 
> or other networking problems.
> Since the addition of ACCUMULO-131, we can be sure that we won't retry actual 
> invalid hosts for ever. Instead, after the time out period that holds for all 
> other problems we'll properly exit.
> The warn messages logged for IOExceptions should suffice to indicate 
> improperly specified host names.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to