My Bad, I pasted the wrong file. It is updated now, did a few tiny modifications(commented in code) and it was working fine for me. http://pastebin.com/RDuZX7Qd
Alex, Thanks a lot for pointing out that. Regards Bejoy.KS On Thu, Nov 24, 2011 at 8:31 PM, Alexander C.H. Lorenz < [email protected]> wrote: > Hi, > > a typo? > import com.bejoy.sampels.worcount.WordCountDriver; > = wor_d_count ? > > - alex > > On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks <[email protected]> wrote: > > > Hi Denis > > I tried your code with out distributed cache locally and it worked > > fine for me. Please find it at > > http://pastebin.com/ki175YUx > > > > I echo Mike's words in submitting a map reduce jobs remotely. The remote > > machine can be your local PC or any utility server as Mike specified. > What > > you need to have in remote machine is a replica of hadoop jars and > > configuration files same as that of your hadoop cluster. (If you don't > have > > a remote util server set up then you can use your dev machine for the > > same). Just trigger the hadoop job on local machine and the actual job > > would be submitted and running on your cluster based on the NN host and > > configuration parameters you have on your config files. > > > > Hope it helps!.. > > > > Regards > > Bejoy.K.S > > > > On Thu, Nov 24, 2011 at 7:09 PM, Michel Segel <[email protected] > > >wrote: > > > > > Denis... > > > > > > Sorry, you lost me. > > > > > > Just to make sure we're using the same terminology... > > > The cluster is comprised of two types of nodes... > > > The data nodes which run DN,TT, and if you have HBase, RS. > > > Then there are control nodes which run you NN,SN, JT and if you run > > HBase, > > > HM and ZKs ... > > > > > > Outside of the cluster we have machines set up with Hadoop installed > but > > > are not running any of the processes. They are where our users launch > > there > > > jobs. We call them edge nodes. ( it's not a good idea to let users > > directly > > > on the actual cluster.) > > > > > > Ok, having said all of that... You launch you job from the edge > nodes... > > > Your data sits in HDFS so you don't need distributed cache at all. Does > > > that make sense? > > > You job will run on the local machine, connect to the JT and then run. > > > > > > We set up the edge nodes so that all of the jars, config files are > > already > > > set up for the users and we can better control access... > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > On Nov 24, 2011, at 7:22 AM, Denis Kreis <[email protected]> wrote: > > > > > > > Without using the distributed cache i'm getting the same error. It's > > > > because i start the job from a remote client / programmatically > > > > > > > > 2011/11/24 Michel Segel <[email protected]>: > > > >> Silly question... Why do you need to use the distributed cache for > the > > > word count program? > > > >> What are you trying to accomplish? > > > >> > > > >> I've only had to play with it for one project where we had to push > out > > > a bunch of c++ code to the nodes as part of a job... > > > >> > > > >> Sent from a remote device. Please excuse any typos... > > > >> > > > >> Mike Segel > > > >> > > > >> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[email protected]> > wrote: > > > >> > > > >>> Hi Bejoy > > > >>> > > > >>> 1. Old API: > > > >>> The Map and Reduce classes are the same as in the example, the main > > > >>> method is as follows > > > >>> > > > >>> public static void main(String[] args) throws IOException, > > > >>> InterruptedException { > > > >>> UserGroupInformation ugi = > > > >>> UserGroupInformation.createProxyUser("<remote user name>", > > > >>> UserGroupInformation.getLoginUser()); > > > >>> ugi.doAs(new PrivilegedExceptionAction<Void>() { > > > >>> public Void run() throws Exception { > > > >>> > > > >>> JobConf conf = new JobConf(WordCount.class); > > > >>> conf.setJobName("wordcount"); > > > >>> > > > >>> conf.setOutputKeyClass(Text.class); > > > >>> conf.setOutputValueClass(IntWritable.class); > > > >>> > > > >>> conf.setMapperClass(Map.class); > > > >>> conf.setCombinerClass(Reduce.class); > > > >>> conf.setReducerClass(Reduce.class); > > > >>> > > > >>> conf.setInputFormat(TextInputFormat.class); > > > >>> conf.setOutputFormat(TextOutputFormat.class); > > > >>> > > > >>> FileInputFormat.setInputPaths(conf, new Path("<path > to > > > input dir>")); > > > >>> FileOutputFormat.setOutputPath(conf, new Path("<path > > to > > > >>> output dir>")); > > > >>> > > > >>> conf.set("mapred.job.tracker", "<ip:8021>"); > > > >>> > > > >>> FileSystem fs = FileSystem.get(new > > > URI("hdfs://<ip>:8020"), > > > >>> new Configuration()); > > > >>> fs.mkdirs(new Path("<remote path>")); > > > >>> fs.copyFromLocalFile(new Path("<local > > path>/test.jar"), > > > new > > > >>> Path("<remote path>")); > > > >>> > > > >>> > > > >> > > > > > > > > > > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > *P **Think of the environment: please don't print this email unless you > really need to.* >
