I have a background worker process (on a server, not a browser) that kicks off
every minute or so and issues some queries sequentially to the rest query
endpoint. In 1.4 with no authentication this worked fine except that in 1
instance I need to issue a CTAS query with a different format (json).
I upgraded to 1.5-SNAPSHOT commit bb3fc15216d9cab804fc9a6f0e5bd34597dd4394
Since the upgrade I am getting a resource starvation problem with or without
authentication
The drillbit process stays up for a an hour or less and then becomes
unresponsive and eats up the cpu.
It is definitely a resource starvation issue, not sure if its a resource leak.
Below is a stack trace.
Also when i lsof on the pid there are a lot (more than a thousand) of files
like this listed which are used by NIO selectors. so it smells like a resource
leak.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 2931 root 288u 0000 0,11 0 7705 anon_inode
2016-02-02 21:56:26,520 [qtp1250890858-11590] ERROR
o.a.d.e.s.r.a.AnonymousLoginService - Login failed.
java.lang.IllegalStateException: failed to create a child event loop
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73)
~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503)
[jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
[jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at org.eclipse.jetty.server.Server.handle(Server.java:462)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
[jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91]
Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open
files
at io.netty.channel.epoll.Native.epollCreate(Native Method)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
... 25 common frames omitted
2016-02-02 21:56:30,130 [qtp1250890858-11591] ERROR
o.a.d.e.s.r.a.AnonymousLoginService - Login failed.
java.lang.IllegalStateException: failed to create a child event loop
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73)
~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47)
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71)
[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503)
[jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
[jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at org.eclipse.jetty.server.Server.handle(Server.java:462)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
[jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91]
Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open
files
at io.netty.channel.epoll.Native.epollCreate(Native Method)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76)
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
at
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
... 25 common frames omitted
> On Feb 2, 2016, at 7:40 AM, Venki Korukanti <[email protected]> wrote:
>
> Currently we keep the DrillClient per session. All the state is in Server
> and DrillClient is the reference to reuse the state. DrillClient is
> automatically closed when the session expires (default value is 1hr after
> the last activity on session) or user explicitly logs out. I am trying to
> understand if there is a resource leak. Do you have too many sessions open
> when the system load is max or just few sessions but you have already ran
> many queries using the existing sessions? If it is the former it is
> understandable to have per connection per session life. Also are the
> resources not freeing up after logout?
>
> If you need to have multiple simultaneous sessions, it is better to connect
> to different Drillbits (may be in a round-robin fashion) than always
> connecting to a single Drillbit.
>
> Thanks
> Venki
>
> On Mon, Feb 1, 2016 at 11:51 PM, Josh Schlesser <[email protected]
> <mailto:[email protected]>> wrote:
>
>> First: Im a total newb at contributing to apache projects so please excuse
>> any indiscretions, feel free to give comments on style or whatever, i take
>> feedback well. Thick skin too.
>>
>>
>> Ill give some background next and then a proposal.
>>
>> Background:
>> I recently changed over to using authentication in the 1.5 snapshot
>> because I need to have a session via the REST api so that I can set the
>> session storage options in an initial query for a subsequent CTAS query.
>> Previously all rest calls seemed to be completely independent.
>>
>> Since the change I have started seeing ‘too many files open’ errors in my
>> drillbit.log and the drillbit java process becomes effectively hung waiting
>> for open file descriptor slots. When running the top command the machine
>> is running at max load due to the drillbit process and the drillbit becomes
>> effectively unresponsive, even the simple pages in the web console don’t
>> respond. Investigating further it seems that there might be a file kept
>> open per session by the drillbit process for the life of the session. I
>> used the lsof unix command on the drillbit process and found a lot of unix
>> pipes. Looking at the code it looks like these pipes could be for the
>> communication between the web process and the rpc server, with one being
>> allocated per session. I haven’t validated this, its just a guess after
>> scanning the code. I had 1.4 running without this requirement and without
>> ever seeing the error. It seems without authentication the number of open
>> files is a non-issue for me, possibly due to sessions.
>>
>> I'm wondering if my guess about what is causing the ‘too many open files’
>> error is plausible? Does anybody with a deeper understanding of the
>> architecture have any comments on this?
>>
>> Proposal:
>> Assuming sessions are the issue, I am making some changes to my rest
>> client so that sessions are more effectively used and I can up the ulimit
>> for the drillbit process for the linux user in hopes of mitigating this. I
>> am effectively creating a rest client based session pool that resets
>> session variables to defaults when the session gets reused. However, it
>> seems hacky.
>>
>> Below is an idea for getting per request based settings which seems less
>> hacky in the long term.
>>
>> Can I add a new array member to the query.json REST method in a backwards
>> compatible way to set session level parameters in a single request?
>> Currently a rest request via the api has a body like so:
>> { “queryType”: “SQL”, “query” : “<drill query>”}
>>
>> id like to do the following
>>
>> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”:
>> [“option_1_name”:”option_1_value”, “option_2_name”:”option_2_value”]}
>>
>> or even
>>
>> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: [“SET
>> `option_name` = value”, “SET `option_name1` = value1”,“SET `option_name2` =
>> value2”, “SET `option_name3` = value3”]}
>>
>> As far as I can tell drill is essentially stateless between queries right
>> now except for session level system parameters and authentication. There
>> aren’t any in memory temp tables or cursors or variables like PL/SQL or
>> PSQL or other SQLs that would make it stateful.
>>
>> Given the stateless assumption, being able to set session level params on
>> a per request basis would cover all of the cases that I might need. It
>> looks relatively straight forward to add something to QueryWrapper to
>> accept an optional query session settings section of the json packet and
>> execute those ’SET' commands before the final query. This will work for
>> me, as I can run without authentication in an ’secure' backend environment
>> which will remove sessions and hence file descriptors, assuming my
>> assumptions about file descriptors and sessions are correct.
>>
>>
>> My java is rusty (circa 2003) but some casual googling implies that if
>> this were added as a 3rd @FormParam to submitQuery in QueryResources it
>> would be magically be null if it werent present and could easily be
>> ignored. If its present then an alternative constructor of QueryWrapper
>> could be called with the extra param and it would be easy to alter its run
>> method to execute the SET commands. There would need to be some error
>> handling of course if the SET commands were illegal or failed to run for
>> some reason.
>>
>> If this seems reasonable, how do I go about contributing? I looked
>> through the links in the docs to apache foundation incubator projects but
>> the links to drill were broken :( http://drill.apache.org/team.html <
>> http://drill.apache.org/team.html <http://drill.apache.org/team.html>> I
>> read this
>> http://drill.apache.org/docs/apache-drill-contribution-guidelines/
>> <http://drill.apache.org/docs/apache-drill-contribution-guidelines/> <
>> http://drill.apache.org/docs/apache-drill-contribution-guidelines/> and
>> i have subscribed to the dev mailing list (obvious since you are getting
>> this). It said to post here before creating a JIRA. Am I missing
>> anything in my assumptions? Comments? Should I just submit a JIRA and a
>> patch or submit a JIRA and a comment or wait for comments before coding
>> stuff up as an example?
>>
>> Thanks for taking the time to read and respond.
>>
>> Josh