First: Im a total newb at contributing to apache projects so please excuse any 
indiscretions, feel free to give comments on style or whatever, i take feedback 
well.  Thick skin too.


Ill give some background next and then a proposal.

Background:
I recently changed over to using authentication in the 1.5 snapshot because I 
need to have a session via the REST api so that I can set the session storage 
options in an initial query for a subsequent CTAS query.  Previously all rest 
calls seemed to be completely independent.

Since the change I have started seeing ‘too many files open’ errors in my 
drillbit.log and the drillbit java process becomes effectively hung waiting for 
open file descriptor slots.  When running the top command the machine is 
running at max load due to the drillbit process and the drillbit becomes 
effectively unresponsive, even the simple pages in the web console don’t 
respond.   Investigating further it seems that there might be a file kept open 
per session by the drillbit process for the life of the session.   I used the 
lsof unix command on the drillbit process and found a lot of unix pipes.  
Looking at the code it looks like these pipes could be for the communication 
between the web process and the rpc server, with one being allocated per 
session.  I haven’t validated this, its just a guess after scanning the code.   
I had 1.4 running without this requirement and without ever seeing the error.  
It seems without authentication the number of open files is a non-issue for me, 
possibly due to sessions.

I'm wondering if my guess about what is causing the ‘too many open files’ error 
is plausible?   Does anybody with a deeper understanding of the architecture 
have any comments on this?

Proposal:
Assuming sessions are the issue, I am making some changes to my rest client so 
that sessions are more effectively used and I can up the ulimit for the 
drillbit process for the linux user in hopes of mitigating this.  I am 
effectively creating a rest client based session pool that resets session 
variables to defaults  when the session gets reused.   However, it seems hacky.

Below is an idea for getting per request based settings which seems less hacky 
in the long term.

Can I add a new array member to the query.json REST method in a backwards 
compatible way to set session level parameters in a single request?  Currently 
a rest request via the api has a body like so:
{ “queryType”: “SQL”, “query” : “<drill query>”}

id like to do the following

{ “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: 
[“option_1_name”:”option_1_value”, “option_2_name”:”option_2_value”]}

or even 

{ “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: [“SET 
`option_name` = value”, “SET `option_name1` = value1”,“SET `option_name2` = 
value2”, “SET `option_name3` = value3”]}

As far as I can tell drill is essentially stateless between queries right now 
except for session level system parameters and authentication.  There aren’t 
any in memory temp tables or cursors or variables like PL/SQL or PSQL or other 
SQLs that would make it stateful.

Given the stateless assumption, being able to set session level params on a per 
request basis would cover all of the cases that I might need.  It looks 
relatively straight forward to add something to QueryWrapper to accept an 
optional query session settings section of the json packet and execute those 
’SET' commands before the final query.    This will work for me, as I can run 
without authentication in an ’secure' backend environment which will remove 
sessions and hence file descriptors, assuming my assumptions about file 
descriptors and sessions are correct.


My java is rusty (circa 2003) but some casual googling implies that if this 
were added as a 3rd @FormParam to submitQuery in QueryResources it would be 
magically be null if it werent present and could easily be ignored. If its 
present then an alternative constructor of QueryWrapper could be called with 
the extra param and it would be easy to alter its run method to execute the SET 
commands.  There would need to be some error handling of course if the SET 
commands were illegal or failed to run for some reason.

If this seems reasonable, how do I go about contributing?  I looked through the 
links in the docs to apache foundation incubator projects but the links to 
drill were broken :(   http://drill.apache.org/team.html 
<http://drill.apache.org/team.html>  I read this 
http://drill.apache.org/docs/apache-drill-contribution-guidelines/ 
<http://drill.apache.org/docs/apache-drill-contribution-guidelines/>  and i 
have subscribed to the dev mailing list (obvious since you are getting this).   
 It said to post here before creating a JIRA.  Am I missing anything in my 
assumptions?  Comments?  Should I just submit a JIRA and a patch or submit a 
JIRA and a comment or wait for comments before coding stuff up as an example?  

Thanks for taking the time to read and respond.

Josh

Reply via email to