I agree with Vijay, I need some sort of session management service that lives on top of hive. That way I can submit a job using an api such as the thrift api. It would also be useful to be able to get the actual job id in hadoop.
Royce On 9/8/09 3:37 PM, "Vijay" <[email protected]> wrote: > I get that HWI does manage sessions but it does that leveraging the internal > functionality of the "server." One usage pattern I'd like is some kind of a > "job" API. What I mean by that is an API that lets us simply submit a query, > get some kind of "job id," and leave. After that we use other APIs to query > the job status, kill it, get the output once it is done, etc. If we have a > simple API like this and the semantics to support this within hive, then the > UI can be completely decoupled and be as stateless as it can (using vanilla > apache+php as an example, we can't really do threads or stay resident after > submitting a job). Does something like this exist either within hive or at the > hadoop level? It seems to me may be this is something that needs to be built > first. > > Thanks, > Vijay > > On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <[email protected]> wrote: >> On Tue, Sep 8, 2009 at 5:15 PM, Royce >> Rollins<[email protected]> wrote: >>> > OK I see. I just looked at the code in HWISessionManager.java. So it >>> looks >>> > like either I will have to write my own ruby HWISessionManager that >>> manages >>> > sessions through thrift or expose the existng HWISessionManager via some >>> web >>> > service interface. Has anyone done this? >>> > >>> > Royce >>> > >>> > >>> > On 9/8/09 1:47 PM, "Edward Capriolo" <[email protected]> wrote: >>> > >>>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<[email protected]> wrote: >>>>> >>> Sorry to inject into this thread but I have the same problem (only I'm >>>>> >>> trying to use the thrift PHP libraries from apache-php scripts). The >>>>> problem >>>>> >>> with this approach is that the http request cannot run indefinitely as the >>>>> >>> server is executing a query. Are there any solutions for this? >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Vijay >>>>> >>> >>>>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins >>>>> <[email protected]> >>>>> >>> wrote: >>>>>> >>>> >>>>>> >>>> Raghu, >>>>>> >>>> Thanks for the quick response. >>>>>> >>>> Yes. My application is web based so instead of having to build some kind >>>>>> >>>> of >>>>>> >>>> session model myself for queries that might take a while, I'd like >>>>>> to use >>>>>> >>>> a session model in the hive service. >>>>>> >>>> >>>>>> >>>> Royce >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <[email protected]> wrote: >>>>>> >>>> >>>>>>> >>>>> Our model so far has been to create a new connection to the hive >>>>>>> thrift >>>>>>> >>>>> server per session. Is there anything specific you are looking for in >>>>>>> >>>>> sessions? >>>>>>> >>>>> >>>>>>> >>>>> >>>>>>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <[email protected]> >>>>>>> wrote: >>>>>>> >>>>> >>>>>>>> >>>>>> I¹m curently working on an application that connects to hive via the >>>>>>>> >>>>>> thrift >>>>>>>> >>>>>> ruby libraries. >>>>>>>> >>>>>> >>>>>>>> >>>>>> Does hive support creation of sessions using those libraries. >>>>>>>> If so, >>>>>>>> >>>>>> how? >>>>>>>> >>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>>>> Royce >>>>>>> >>>>> >>>>>> >>>> >>>>> >>> >>>>> >>> >>>> >> >>>> >> Royce, >>>> >> >>>> >> The Hive Web Interface deals with this by having a threaded object >>>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP >>>> >> has any equivalent to threading and Application Scope. >>>> >> >>>> >> Edward >>> > >>> > >> >> Someone correct me if I am wrong. >> >> Royce, >> >> You may be able to get at this another way. From my understanding, the >> internal hive web interface used at facebook would spawn ` bin/hive -e >> 'INSERT INTO X select * FROM`. All results were written to a hive >> table. >> >> Doing it this way gives you no way to interact with the query and >> 'stream' the result, set you can't really use 'fetchOne()' or >> 'fetchAll()' but you could start a query and set flags on completion. >> >> As for web interface, we just had some talks, and one of the things I >> was looking to do was create some type of web service style bindings. >> (We would also like to have HWI talk to Thrift and have thrift be the >> code path for everything). However, if we do make some web server >> style bindings they would really be independent of the back end. Do >> you want to work on this ? I would like to open a Jira and tackle the >> issue. >> >> >> The big picture here is that we need a 'state holder'. That is really >> what HWI is. You create a session, detach from it, and optionally >> check on it later. If an application needs that pattern how to handle >> it? >> >> One way to tackle this is >> >> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' & >> >> then have your client 'tail' the hdfs://path/to/file or record the >> last position it saw. I guess the big question is dealing with >> streaming results. HWI manages the session for you and writes the >> results to a local file, (and the new SessionBucket >> >> What is the usage pattern you need? > >
