Hive only needs to be installed at the node that runs the hive query.
All the jars will be sent to the hadoop JobClient via -libjars. The
code is in ExecDriver.java.

In hadoop 0.17, I don't think there is a way to add a path to
classpath for a job (unless we put it in hadoop-env.sh and start
TaskTracker with that path). are there any changes in the latter
versions?



Zheng



On 7/24/09, Edward Capriolo <[email protected]> wrote:
> I have been following some threads on the hadoop mailing list about
> speeding up MR jobs. I have a few questions I am sure I can find the
> answer to if I dig into the source code but I thought I could get a
> quick answer.
>
> 1 ADD JAR 'myfile.jar'  uses the distributed cache. Using the
> distributed cache has some overhead. I know if I create an auxlibs
> directory under hive root, they will be added to libjars on startup.
> If i add my jar to auxlibs on all my nodes will a UDF in the jar be
> available during subsequent jobs? Or is it only necessary to add those
> jars to the auxlib on the node I start the job from.
>
> 2 Dealing with the entire hive install. How much of the hive install
> really needs to be replication on each datanode? If we used
> distributed cache for everything the jobs would have unneeded
> overhead, but hive would be 'installed on demand' from the client.
>
> Thanks,
> Edward
>

-- 
Sent from Gmail for mobile | mobile.google.com

Yours,
Zheng

Reply via email to