Re: Hive and thrift session help

Edward Capriolo Tue, 08 Sep 2009 14:52:40 -0700

On Tue, Sep 8, 2009 at 5:15 PM, Royce
Rollins<[email protected]> wrote:
> OK I see. I just looked at the code in HWISessionManager.java.  So it looks
> like either I will have to write my own ruby HWISessionManager that manages
> sessions through thrift or expose the existng HWISessionManager via some web
> service interface.  Has anyone done this?
>
> Royce
>
>
> On 9/8/09 1:47 PM, "Edward Capriolo" <[email protected]> wrote:
>
>> On Tue, Sep 8, 2009 at 4:38 PM, Vijay<[email protected]> wrote:
>>> Sorry to inject into this thread but I have the same problem (only I'm
>>> trying to use the thrift PHP libraries from apache-php scripts). The problem
>>> with this approach is that the http request cannot run indefinitely as the
>>> server is executing a query. Are there any solutions for this?
>>>
>>> Thanks,
>>> Vijay
>>>
>>> On Tue, Sep 8, 2009 at 1:35 PM, Royce Rollins <[email protected]>
>>> wrote:
>>>>
>>>> Raghu,
>>>> Thanks for the quick response.
>>>> Yes.  My application is web based so instead of having to build some kind
>>>> of
>>>> session model myself for queries that might take a while,  I'd like to use
>>>> a session model in the hive service.
>>>>
>>>> Royce
>>>>
>>>>
>>>> On 9/8/09 1:32 PM, "Raghu Murthy" <[email protected]> wrote:
>>>>
>>>>> Our model so far has been to create a new connection to the hive thrift
>>>>> server per session. Is there anything specific you are looking for in
>>>>> sessions?
>>>>>
>>>>>
>>>>> On 9/8/09 1:06 PM, "Royce Rollins" <[email protected]> wrote:
>>>>>
>>>>>> I¹m curently working on an application that connects to hive via the
>>>>>> thrift
>>>>>> ruby libraries.
>>>>>>
>>>>>> Does hive support creation of sessions using those libraries.  If so,
>>>>>> how?
>>>>>>
>>>>>>
>>>>>> Royce
>>>>>
>>>>
>>>
>>>
>>
>> Royce,
>>
>> The Hive Web Interface deals with this by having a threaded object
>> (HWISessionManager) in the Web application scope. I am not sure if PHP
>> has any equivalent to threading and Application Scope.
>>
>> Edward
>
>


Someone correct me if I am wrong.

Royce,

You may be able to get at this another way. From my understanding, the
internal hive web interface used at facebook would spawn ` bin/hive -e
'INSERT INTO X select * FROM`. All results were written to a hive
table.

Doing it this way gives you no way to interact with the query and
'stream' the result, set you can't really use 'fetchOne()' or
'fetchAll()' but you could start a query and set flags on completion.

As for web interface, we just had some talks, and one of the things I
was looking to do was create some type of web service style bindings.
(We would also like to have HWI talk to Thrift and have thrift be the
code path for everything). However, if we do make some web server
style bindings they would really be independent of the back end. Do
you want to work on this ? I would like to open a Jira and tackle the
issue.


The big picture here is that we need a 'state holder'. That is really
what HWI is. You create a session, detach from it, and optionally
check on it later. If an application needs that pattern how to handle
it?

One way to tackle this is

INSERT INTO file 'hdfs://path/to/file' select * FROM XXX' &

then have your client 'tail' the hdfs://path/to/file or record the
last position it saw. I guess the big question is dealing with
streaming results. HWI manages the session for you and writes the
results to a local file, (and the new SessionBucket

What is the usage pattern you need?

Re: Hive and thrift session help

Reply via email to