[
https://issues.apache.org/jira/browse/ACCUMULO-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305476#comment-15305476
]
Michiel Vanderlee commented on ACCUMULO-4317:
---------------------------------------------
Just had this happen on my Accumulo Masters as well.
Our HDFS cluster broke down and went into savemode, so after fixing it the
accumulo masters reconnected automatically but when I looked at the logs a few
minutes later, I first saw a ton of these:
{noformat}
2016-05-28 16:27:27,069 [rpc.ThriftUtil] WARN : Failed to open transport to
arch05:9997
2016-05-28 16:27:27,069 [master.Master] ERROR: Error processing table state for
store Metadata Tablets
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException
at
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:326)
at
org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:190)
at
org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:91)
at
org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:792)
at
org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295)
Caused by: java.net.UnknownHostException
at sun.nio.ch.Net.translateException(Net.java:175)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:139)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
at
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:72)
at
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:65)
at
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:323)
... 4 more
{noformat}
Then eventually a ton of these:
{noformat}
2016-05-28 16:46:36,306 [rpc.ThriftUtil] WARN : Failed to open transport to
arch05:9997
2016-05-28 16:46:36,306 [master.Master] ERROR: Error processing table state for
store Root Table
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too
many open files
at
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:326)
at
org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:190)
at
org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.assignTablet(LiveTServerSet.java:91)
at
org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:792)
at
org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295)
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:438)
at sun.nio.ch.Net.socket(Net.java:431)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:118)
at
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:72)
at
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:69)
at
org.apache.accumulo.core.rpc.TTimeoutTransport.create(TTimeoutTransport.java:65)
at
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:323)
... 4 more
{noformat}
I wonder if the issue is that the Socket doesn't get closed on exception.
{code:title=TTimeoutTransport.java|borderStyle=solid}
public static TTransport create(SocketAddress addr, long timeoutMillis) throws
IOException {
Socket socket = SelectorProvider.provider().openSocketChannel().socket();
socket.setSoLinger(false, 0);
socket.setTcpNoDelay(true);
socket.connect(addr);
InputStream input = new BufferedInputStream(getInputStream(socket,
timeoutMillis), 1024 * 10);
OutputStream output = new
BufferedOutputStream(NetUtils.getOutputStream(socket, timeoutMillis), 1024 *
10);
return new TIOStreamTransport(input, output);
}
{code}
> Accumulo client causes 'too many files open' due to infinite loop.
> ------------------------------------------------------------------
>
> Key: ACCUMULO-4317
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4317
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.7.1
> Reporter: Michiel Vanderlee
> Priority: Minor
>
> Accumulo stores hostnames in zookeeper, if the client can not resolve these
> then it will continue to try to connect in a while(true) loop. This will
> eventually cause 'too many files open' errors.
> Loop is in ServerClient.java$executeRaw
> Bug: Should error out after some time, not retry infintely.
> Workaround: Add hostnames to /etc/hosts and restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)