Re: Adding jar files when running hive in hwi mode or hiveserver mode

Edward Capriolo Wed, 26 Aug 2009 14:44:04 -0700

On Wed, Aug 26, 2009 at 3:25 PM, Raghu Murthy<[email protected]> wrote:
> Even if we decided to have multiple HiveServers, wouldn't it be possible for
> HWI to randomly pick a HiveServer to connect to per query/client?
>
> On 8/26/09 12:16 PM, "Ashish Thusoo" <[email protected]> wrote:
>
>> +1 for ajaxing this baby.
>>
>> On the broader question of whether we should combine HWI and HiveServer - I
>> think there are definite deployment and code reuse advantages in doing so,
>> however keeping them separate also has the advantage that we can cluster
>> HiveServers independently from HWI. Since the HiveServer sits in the data
>> path, the independent scaling may have advantages. I am not sure how strong 
>> of
>> an argument that is to not put them together. Simplicity obviously indicates
>> that we should have them together.
>>
>> Thoughts?
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:[email protected]]
>> Sent: Wednesday, August 26, 2009 9:45 AM
>> To: [email protected]
>> Subject: Re: Adding jar files when running hive in hwi mode or hiveserver 
>> mode
>>
>> On Tue, Aug 25, 2009 at 8:13 PM, Vijay<[email protected]> wrote:
>>> Yep, I got it and now it works perfectly! I like hwi btw! It
>>> definitely makes things easier for a wider audience to try out hive.
>>> Your new session result bucket idea is very nice as well. I will keep
>>> trying more things and see if anything else comes up but so far it looks
>>> great!
>>> Thanks Edward!
>>>
>>> On Tue, Aug 25, 2009 at 7:25 AM, Edward Capriolo
>>> <[email protected]>
>>> wrote:
>>>>
>>>> On Tue, Aug 25, 2009 at 10:18 AM, Edward
>>>> Capriolo<[email protected]>
>>>> wrote:
>>>>> On Mon, Aug 24, 2009 at 10:13 PM, Vijay<[email protected]> wrote:
>>>>>> Probably spoke too soon :) I added this comment to the JIRA ticket
>>>>>> above.
>>>>>>
>>>>>> Hi, I tried the latest patch on trunk and there seems to be a problem.
>>>>>>
>>>>>> I was interested in using the "add jar " command to add jar files
>>>>>> to the path. However, by the time the command flows through the
>>>>>> SessionState to the AddResourceProcessor (in
>>>>>>
>>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc
>>>>>> essor.java), the command word "add" is not being stripped so the
>>>>>> resource processor is trying to find a ResourceType of "ADD."
>>>>>>
>>>>>> I'm not sure if this was an existing bug or was a result of the
>>>>>> current set of changes.
>>>>>>
>>>>>> [ Show > ]
>>>>>> Vijay added a comment - 24/Aug/09 07:12 PM Hi, I tried the latest
>>>>>> patch on trunk and there seems to be a problem. I was interested
>>>>>> in using the "add jar " command to add jar files to the path.
>>>>>> However, by the time the command flows through the SessionState to
>>>>>> the AddResourceProcessor (in
>>>>>>
>>>>>> ./ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProc
>>>>>> essor.java), the command word "add" is not being stripped so the
>>>>>> resource processor is trying to find a ResourceType of "ADD." I'm
>>>>>> not sure if this was an existing bug or was a result of the
>>>>>> current set of changes.
>>>>>> On Mon, Aug 24, 2009 at 5:30 PM, Vijay <[email protected]> wrote:
>>>>>>>
>>>>>>> That's awesome and looks like exactly what I needed. Local file
>>>>>>> system requirement is perfectly ok for now. I will check it out right
>>>>>>> away!
>>>>>>> Hopefully it will be checked in soon.
>>>>>>>
>>>>>>> Thanks Edward!
>>>>>>>
>>>>>>> On Mon, Aug 24, 2009 at 5:14 PM, Edward Capriolo
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Aug 24, 2009 at 8:09 PM, Prasad
>>>>>>>> Chakka<[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Vijay, there is no solution for it yet. There may be a jira
>>>>>>>>> open but AFAIK, no one is working on it. You are welcome to
>>>>>>>>> contribute this feature.
>>>>>>>>>
>>>>>>>>> Prasad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> From: Vijay <[email protected]>
>>>>>>>>> Reply-To: <[email protected]>
>>>>>>>>> Date: Mon, 24 Aug 2009 16:59:28 -0700
>>>>>>>>> To: <[email protected]>
>>>>>>>>> Subject: Re: Adding jar files when running hive in hwi mode or
>>>>>>>>> hiveserver mode
>>>>>>>>>
>>>>>>>>> Hi, is there any solution for this? How does everybody include
>>>>>>>>> custom jar files running hive in a non-cli mode?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Vijay
>>>>>>>>>
>>>>>>>>> On Sat, Aug 22, 2009 at 6:19 PM, Vijay <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> When I run hive in cli mode, I add the hive_contrib.jar file
>>>>>>>>> using this
>>>>>>>>> command:
>>>>>>>>>
>>>>>>>>> hive> add jar lib/hive_contrib.jar
>>>>>>>>>
>>>>>>>>> Is there a way to do this automatically when running hive in
>>>>>>>>> hwi or hiveserver modes? Or do I have to add the jar file
>>>>>>>>> explicitly to any of the startup scripts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Vijay,
>>>>>>>>
>>>>>>>> Currently HWI does not support this. The changes in
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-716 will make this
>>>>>>>> possible (although I did not test but it should work as the cli
>>>>>>>> does). The file will have to be in the servers local file
>>>>>>>> system. We could probably include 'commons upload' to the web
>>>>>>>> interface if there was a need for it.
>>>>>>>>
>>>>>>>> HIVE-716 should be in trunk soon. It does apply cleanly if its
>>>>>>>> something you need today, Edward
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> I just committed a new version of the patch. You were correct, the
>>>>> clidriver trims the first token off set and add queries hwi was not
>>>>> doing that. Also let me know your impressions of HWI.
>>>>>
>>>>> The new features are the 'ResultBucket' a buffer of the last x
>>>>> results viewable from the web interface, and the ability to supply
>>>>> more then one query at a time.
>>>>>
>>>>> These two features should add much usability now as you can do
>>>>> things like explain, show tables, etc and not have to dump the
>>>>> results to a file.
>>>>>
>>>>> Edward
>>>>>
>>>>
>>>> False statement:
>>>>>> I just committed a new version of the patch
>>>>
>>>> In actuality, I updated the Jira with a new patch.
>>>>
>>>> It is still early AM. all the gears are not turning yet.
>>>>
>>>> Edward
>>>
>>>
>>
>> Vijay,
>>
>>>> It definitely makes things easier for a wider audience to try out
>>>> hive
>>
>> That was always the goal. I often wonder which direction we should take HWI
>> in.
>> Should HWI have some REST-ful stubs to turn it into a remote job submission
>> system?
>> HiveServer uses thrift and I believe thrift has an HTTP-Transport so you 
>> might
>> not need HWI to provide this.
>>
>> Should we ajax things like the result bucket or the entire interface so it 
>> has
>> that ooo aaahhh effect?
>>
>> Really the larger question HWI has it's own multi-session management,
>> HiveServer has this as well (now way back when it did not) . Should HWI just
>> front end HiveServer?
>>
>> Does anyone have any thoughts?
>> Edward
>
>


I think Raghu is correct. HiveClient->HiveServer happens on a
permanent TCP connection (I think?). If you had a back end cluster of
HiveServers,  and you had a load balancer or proxy with
sticky-session/session-tracking/source-ip policy. HWI would be
configured with the virtual IP address of the load balancer and would
connect and stay connected to a random HiveServer in the farm.

I am naturally partial to the way it is now because I came up with it :)

I like the idea of having a REST-ful/XML-RPC or some web service style
interface for job submit.

My thinking behind HWI has always been KISS. Keep It Simple Stupid.
Anyone should be able to hack a few web pages onto it. Adding thrift,
ajax, XML-RPC layers definitely ups the complexity.

It think it makes sense to do HWI->HiveServer. I will have to take a
deeper look at what HiveServer and thrift offers to be sure.

Edward

Re: Adding jar files when running hive in hwi mode or hiveserver mode

Reply via email to