Re: Hive and thrift session help

Vijay Tue, 08 Sep 2009 15:38:07 -0700

I get that HWI does manage sessions but it does that leveraging the internal
functionality of the "server." One usage pattern I'd like is some kind of a
"job" API. What I mean by that is an API that lets us simply submit a query,
get some kind of "job id," and leave. After that we use other APIs to query
the job status, kill it, get the output once it is done, etc. If we have a
simple API like this and the semantics to support this within hive, then the
UI can be completely decoupled and be as stateless as it can (using vanilla
apache+php as an example, we can't really do threads or stay resident after
submitting a job). Does something like this exist either within hive or at
the hadoop level? It seems to me may be this is something that needs to be
built first.


Thanks,
Vijay

On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <[email protected]>wrote:

> On Tue, Sep 8, 2009 at 5:15 PM, Royce
> Rollins<[email protected]> wrote:
> > OK I see. I just looked at the code in HWISessionManager.java.  So it
> looks
> > like either I will have to write my own ruby HWISessionManager that
> manages
> > sessions through thrift or expose the existng HWISessionManager via some
> web
> > service interface.  Has anyone done this?
> >
> > Royce
> >
> >
> > On 9/8/09 1:47 PM, "Edward Capriolo" <[email protected]> wrote:
> >
> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<[email protected]> wrote:
> >>> Sorry to inject into this thread but I have the same problem (only I'm
> >>> trying to use the thrift PHP libraries from apache-php scripts). The
> problem
> >>> with this approach is that the http request cannot run indefinitely as
> the
> >>> server is executing a query. Are there any solutions for this?
> >>>
> >>> Thanks,
> >>> Vijay
> >>>
> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> Raghu,
> >>>> Thanks for the quick response.
> >>>> Yes.  My application is web based so instead of having to build some
> kind
> >>>> of
> >>>> session model myself for queries that might take a while,  I'd like to
> use
> >>>> a session model in the hive service.
> >>>>
> >>>> Royce
> >>>>
> >>>>
> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <[email protected]> wrote:
> >>>>
> >>>>> Our model so far has been to create a new connection to the hive
> thrift
> >>>>> server per session. Is there anything specific you are looking for in
> >>>>> sessions?
> >>>>>
> >>>>>
> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <[email protected]>
> wrote:
> >>>>>
> >>>>>> I¹m curently working on an application that connects to hive via the
> >>>>>> thrift
> >>>>>> ruby libraries.
> >>>>>>
> >>>>>> Does hive support creation of sessions using those libraries.  If
> so,
> >>>>>> how?
> >>>>>>
> >>>>>>
> >>>>>> Royce
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >> Royce,
> >>
> >> The Hive Web Interface deals with this by having a threaded object
> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
> >> has any equivalent to threading and Application Scope.
> >>
> >> Edward
> >
> >
>
> Someone correct me if I am wrong.
>
> Royce,
>
> You may be able to get at this another way. From my understanding, the
> internal hive web interface used at facebook would spawn ` bin/hive -e
> 'INSERT INTO X select * FROM`. All results were written to a hive
> table.
>
> Doing it this way gives you no way to interact with the query and
> 'stream' the result, set you can't really use 'fetchOne()' or
> 'fetchAll()' but you could start a query and set flags on completion.
>
> As for web interface, we just had some talks, and one of the things I
> was looking to do was create some type of web service style bindings.
> (We would also like to have HWI talk to Thrift and have thrift be the
> code path for everything). However, if we do make some web server
> style bindings they would really be independent of the back end. Do
> you want to work on this ? I would like to open a Jira and tackle the
> issue.
>
>
> The big picture here is that we need a 'state holder'. That is really
> what HWI is. You create a session, detach from it, and optionally
> check on it later. If an application needs that pattern how to handle
> it?
>
> One way to tackle this is
>
> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>
> then have your client 'tail' the hdfs://path/to/file or record the
> last position it saw. I guess the big question is dealing with
> streaming results. HWI manages the session for you and writes the
> results to a local file, (and the new SessionBucket
>
> What is the usage pattern you need?
>

Re: Hive and thrift session help

Reply via email to