On Fri, Jul 24, 2009 at 1:36 PM, Zheng Shao<[email protected]> wrote: > Hive only needs to be installed at the node that runs the hive query. > All the jars will be sent to the hadoop JobClient via -libjars. The > code is in ExecDriver.java. > > In hadoop 0.17, I don't think there is a way to add a path to > classpath for a job (unless we put it in hadoop-env.sh and start > TaskTracker with that path). are there any changes in the latter > versions? > > > > Zheng > > > > On 7/24/09, Edward Capriolo <[email protected]> wrote: >> I have been following some threads on the hadoop mailing list about >> speeding up MR jobs. I have a few questions I am sure I can find the >> answer to if I dig into the source code but I thought I could get a >> quick answer. >> >> 1 ADD JAR 'myfile.jar' uses the distributed cache. Using the >> distributed cache has some overhead. I know if I create an auxlibs >> directory under hive root, they will be added to libjars on startup. >> If i add my jar to auxlibs on all my nodes will a UDF in the jar be >> available during subsequent jobs? Or is it only necessary to add those >> jars to the auxlib on the node I start the job from. >> >> 2 Dealing with the entire hive install. How much of the hive install >> really needs to be replication on each datanode? If we used >> distributed cache for everything the jobs would have unneeded >> overhead, but hive would be 'installed on demand' from the client. >> >> Thanks, >> Edward >> > > -- > Sent from Gmail for mobile | mobile.google.com > > Yours, > Zheng >
Zheng, A thread from the hadoop list peaked my interest. search. "hadoop jobs take long time to setup" http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%[email protected]%3e Can hive benefit? Edward
