On Tue, Sep 8, 2009 at 6:37 PM, Vijay<[email protected]> wrote:
> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at
> the hadoop level? It seems to me may be this is something that needs to be
> built first.
>
> Thanks,
> Vijay
>
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <[email protected]>
> wrote:
>>
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<[email protected]> wrote:
>> > OK I see. I just looked at the code in HWISessionManager.java.  So it
>> > looks
>> > like either I will have to write my own ruby HWISessionManager that
>> > manages
>> > sessions through thrift or expose the existng HWISessionManager via some
>> > web
>> > service interface.  Has anyone done this?
>> >
>> > Royce
>> >
>> >
>> > On 9/8/09 1:47 PM, "Edward Capriolo" <[email protected]> wrote:
>> >
>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<[email protected]> wrote:
>> >>> Sorry to inject into this thread but I have the same problem (only I'm
>> >>> trying to use the thrift PHP libraries from apache-php scripts). The
>> >>> problem
>> >>> with this approach is that the http request cannot run indefinitely as
>> >>> the
>> >>> server is executing a query. Are there any solutions for this?
>> >>>
>> >>> Thanks,
>> >>> Vijay
>> >>>
>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>> >>> <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Raghu,
>> >>>> Thanks for the quick response.
>> >>>> Yes.  My application is web based so instead of having to build some
>> >>>> kind
>> >>>> of
>> >>>> session model myself for queries that might take a while,  I'd like
>> >>>> to use
>> >>>> a session model in the hive service.
>> >>>>
>> >>>> Royce
>> >>>>
>> >>>>
>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <[email protected]> wrote:
>> >>>>
>> >>>>> Our model so far has been to create a new connection to the hive
>> >>>>> thrift
>> >>>>> server per session. Is there anything specific you are looking for
>> >>>>> in
>> >>>>> sessions?
>> >>>>>
>> >>>>>
>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <[email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> I¹m curently working on an application that connects to hive via
>> >>>>>> the
>> >>>>>> thrift
>> >>>>>> ruby libraries.
>> >>>>>>
>> >>>>>> Does hive support creation of sessions using those libraries.  If
>> >>>>>> so,
>> >>>>>> how?
>> >>>>>>
>> >>>>>>
>> >>>>>> Royce
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >> Royce,
>> >>
>> >> The Hive Web Interface deals with this by having a threaded object
>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
>> >> has any equivalent to threading and Application Scope.
>> >>
>> >> Edward
>> >
>> >
>>
>> Someone correct me if I am wrong.
>>
>> Royce,
>>
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>>
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>>
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>>
>>
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>>
>> One way to tackle this is
>>
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>>
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>>
>> What is the usage pattern you need?
>
>

Vijay,

> What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave.

No. (again someone correct me if I am wrong) As I under, if you
disconnect from the Thrift HiveServer you can not reconnect.

Assuming we punt on intermediate data (large queries with 10 TB of
results waiting for client pickup). There are a few ways we (you)
could handle this.

You could use HWI as a web service. With some URL hacking like
http://hwi:9999/hwi/create_session.jsp?name=bob

This is not a true XML web service, but you could use it to accomplish
your goals.

> After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc

We could write some other XMLRPC style JSP pages that would be a more
formal web service.

Hive Thrift Server could support this directly maybe with alternate
constructors or objects for detached sessions.

In summary
option 1) URL hacking (you have that today, not very clean)
option 2) web service bindings ( you could have that pretty fast, more
clean does not have to touch anything upstream)
option 3) detached sessions HiveServer ( patched HiveServer patched
Hive Bindings, clean,)

Reply via email to