On Fri, Jul 24, 2009 at 1:45 PM, Edward Capriolo<[email protected]> wrote: > On Fri, Jul 24, 2009 at 1:36 PM, Zheng Shao<[email protected]> wrote: >> Hive only needs to be installed at the node that runs the hive query. >> All the jars will be sent to the hadoop JobClient via -libjars. The >> code is in ExecDriver.java. >> >> In hadoop 0.17, I don't think there is a way to add a path to >> classpath for a job (unless we put it in hadoop-env.sh and start >> TaskTracker with that path). are there any changes in the latter >> versions? >> >> >> >> Zheng >> >> >> >> On 7/24/09, Edward Capriolo <[email protected]> wrote: >>> I have been following some threads on the hadoop mailing list about >>> speeding up MR jobs. I have a few questions I am sure I can find the >>> answer to if I dig into the source code but I thought I could get a >>> quick answer. >>> >>> 1 ADD JAR 'myfile.jar' uses the distributed cache. Using the >>> distributed cache has some overhead. I know if I create an auxlibs >>> directory under hive root, they will be added to libjars on startup. >>> If i add my jar to auxlibs on all my nodes will a UDF in the jar be >>> available during subsequent jobs? Or is it only necessary to add those >>> jars to the auxlib on the node I start the job from. >>> >>> 2 Dealing with the entire hive install. How much of the hive install >>> really needs to be replication on each datanode? If we used >>> distributed cache for everything the jobs would have unneeded >>> overhead, but hive would be 'installed on demand' from the client. >>> >>> Thanks, >>> Edward >>> >> >> -- >> Sent from Gmail for mobile | mobile.google.com >> >> Yours, >> Zheng >> > > Zheng, > > A thread from the hadoop list peaked my interest. search. > "hadoop jobs take long time to setup" > > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%[email protected]%3e > > Can hive benefit? > Edward >
Could we use something like this for a performance increase? With the assumption that the jars are present on all task-trackers could we have an alternate invocation script such as bin/hive-local ? Edward
