Hive only needs to be installed at the node that runs the hive query. All the jars will be sent to the hadoop JobClient via -libjars. The code is in ExecDriver.java.
In hadoop 0.17, I don't think there is a way to add a path to classpath for a job (unless we put it in hadoop-env.sh and start TaskTracker with that path). are there any changes in the latter versions? Zheng On 7/24/09, Edward Capriolo <[email protected]> wrote: > I have been following some threads on the hadoop mailing list about > speeding up MR jobs. I have a few questions I am sure I can find the > answer to if I dig into the source code but I thought I could get a > quick answer. > > 1 ADD JAR 'myfile.jar' uses the distributed cache. Using the > distributed cache has some overhead. I know if I create an auxlibs > directory under hive root, they will be added to libjars on startup. > If i add my jar to auxlibs on all my nodes will a UDF in the jar be > available during subsequent jobs? Or is it only necessary to add those > jars to the auxlib on the node I start the job from. > > 2 Dealing with the entire hive install. How much of the hive install > really needs to be replication on each datanode? If we used > distributed cache for everything the jobs would have unneeded > overhead, but hive would be 'installed on demand' from the client. > > Thanks, > Edward > -- Sent from Gmail for mobile | mobile.google.com Yours, Zheng
