On Fri, Jul 24, 2009 at 1:36 PM, Zheng Shao<[email protected]> wrote:
> Hive only needs to be installed at the node that runs the hive query.
> All the jars will be sent to the hadoop JobClient via -libjars. The
> code is in ExecDriver.java.
>
> In hadoop 0.17, I don't think there is a way to add a path to
> classpath for a job (unless we put it in hadoop-env.sh and start
> TaskTracker with that path). are there any changes in the latter
> versions?
>
>
>
> Zheng
>
>
>
> On 7/24/09, Edward Capriolo <[email protected]> wrote:
>> I have been following some threads on the hadoop mailing list about
>> speeding up MR jobs. I have a few questions I am sure I can find the
>> answer to if I dig into the source code but I thought I could get a
>> quick answer.
>>
>> 1 ADD JAR 'myfile.jar'  uses the distributed cache. Using the
>> distributed cache has some overhead. I know if I create an auxlibs
>> directory under hive root, they will be added to libjars on startup.
>> If i add my jar to auxlibs on all my nodes will a UDF in the jar be
>> available during subsequent jobs? Or is it only necessary to add those
>> jars to the auxlib on the node I start the job from.
>>
>> 2 Dealing with the entire hive install. How much of the hive install
>> really needs to be replication on each datanode? If we used
>> distributed cache for everything the jobs would have unneeded
>> overhead, but hive would be 'installed on demand' from the client.
>>
>> Thanks,
>> Edward
>>
>
> --
> Sent from Gmail for mobile | mobile.google.com
>
> Yours,
> Zheng
>

Zheng,

A thread from the  hadoop list peaked my interest. search.
"hadoop jobs take long time to setup"

http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%[email protected]%3e

Can hive benefit?
Edward

Reply via email to