Re: Hive and thrift session help

Royce Rollins Tue, 08 Sep 2009 16:01:26 -0700

I agree with Vijay,
I need some sort of session management service that lives on top of hive.
That way I  can submit a job using an api such as the thrift api.  It would
also be useful to be able to get the
actual job id in hadoop.


Royce


On 9/8/09 3:37 PM, "Vijay" <[email protected]> wrote:

> I get that HWI does manage sessions but it does that leveraging the internal
> functionality of the "server." One usage pattern I'd like is some kind of a
> "job" API. What I mean by that is an API that lets us simply submit a query,
> get some kind of "job id," and leave. After that we use other APIs to query
> the job status, kill it, get the output once it is done, etc. If we have a
> simple API like this and the semantics to support this within hive, then the
> UI can be completely decoupled and be as stateless as it can (using vanilla
> apache+php as an example, we can't really do threads or stay resident after
> submitting a job). Does something like this exist either within hive or at the
> hadoop level? It seems to me may be this is something that needs to be built
> first.
> 
> Thanks,
> Vijay
> 
> On Tue, Sep 8, 2009 at 2:52 PM, Edward Capriolo <[email protected]> wrote:
>> On Tue, Sep 8, 2009 at 5:15 PM, Royce
>> Rollins<[email protected]> wrote:
>>> > OK I see. I just looked at the code in HWISessionManager.java.  So it
>>> looks
>>> > like either I will have to write my own ruby HWISessionManager that
>>> manages
>>> > sessions through thrift or expose the existng HWISessionManager via some
>>> web
>>> > service interface.  Has anyone done this?
>>> >
>>> > Royce
>>> >
>>> >
>>> > On 9/8/09 1:47 PM, "Edward Capriolo" <[email protected]> wrote:
>>> >
>>>> >> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<[email protected]> wrote:
>>>>> >>> Sorry to inject into this thread but I have the same problem (only I'm
>>>>> >>> trying to use the thrift PHP libraries from apache-php scripts). The
>>>>> problem
>>>>> >>> with this approach is that the http request cannot run indefinitely as
the
>>>>> >>> server is executing a query. Are there any solutions for this?
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Vijay
>>>>> >>>
>>>>> >>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins
>>>>> <[email protected]>
>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> Raghu,
>>>>>> >>>> Thanks for the quick response.
>>>>>> >>>> Yes.  My application is web based so instead of having to build some
kind
>>>>>> >>>> of
>>>>>> >>>> session model myself for queries that might take a while,  I'd like
>>>>>> to use
>>>>>> >>>> a session model in the hive service.
>>>>>> >>>>
>>>>>> >>>> Royce
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On 9/8/09 1:32 PM, "Raghu Murthy" <[email protected]> wrote:
>>>>>> >>>>
>>>>>>> >>>>> Our model so far has been to create a new connection to the hive
>>>>>>> thrift
>>>>>>> >>>>> server per session. Is there anything specific you are looking for
in
>>>>>>> >>>>> sessions?
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On 9/8/09 1:06 PM, "Royce Rollins" <[email protected]>
>>>>>>> wrote:
>>>>>>> >>>>>
>>>>>>>> >>>>>> I¹m curently working on an application that connects to hive via
the
>>>>>>>> >>>>>> thrift
>>>>>>>> >>>>>> ruby libraries.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Does hive support creation of sessions using those libraries.
>>>>>>>>  If so,
>>>>>>>> >>>>>> how?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Royce
>>>>>>> >>>>>
>>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>> >>
>>>> >> Royce,
>>>> >>
>>>> >> The Hive Web Interface deals with this by having a threaded object
>>>> >> (HWISessionManager) in the Web application scope. I am not sure if PHP
>>>> >> has any equivalent to threading and Application Scope.
>>>> >>
>>>> >> Edward
>>> >
>>> >
>> 
>> Someone correct me if I am wrong.
>> 
>> Royce,
>> 
>> You may be able to get at this another way. From my understanding, the
>> internal hive web interface used at facebook would spawn ` bin/hive -e
>> 'INSERT INTO X select * FROM`. All results were written to a hive
>> table.
>> 
>> Doing it this way gives you no way to interact with the query and
>> 'stream' the result, set you can't really use 'fetchOne()' or
>> 'fetchAll()' but you could start a query and set flags on completion.
>> 
>> As for web interface, we just had some talks, and one of the things I
>> was looking to do was create some type of web service style bindings.
>> (We would also like to have HWI talk to Thrift and have thrift be the
>> code path for everything). However, if we do make some web server
>> style bindings they would really be independent of the back end. Do
>> you want to work on this ? I would like to open a Jira and tackle the
>> issue.
>> 
>> 
>> The big picture here is that we need a 'state holder'. That is really
>> what HWI is. You create a session, detach from it, and optionally
>> check on it later. If an application needs that pattern how to handle
>> it?
>> 
>> One way to tackle this is
>> 
>> INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &
>> 
>> then have your client 'tail' the hdfs://path/to/file or record the
>> last position it saw. I guess the big question is dealing with
>> streaming results. HWI manages the session for you and writes the
>> results to a local file, (and the new SessionBucket
>> 
>> What is the usage pattern you need?
> 
>

Re: Hive and thrift session help

Reply via email to