Re: Issue with DistributedCache

Bejoy Ks Thu, 24 Nov 2011 08:04:27 -0800

My Bad, I pasted the wrong file. It is updated now, did a few tiny
modifications(commented in code) and it was working fine for me.
http://pastebin.com/RDuZX7Qd


Alex,
Thanks a lot for pointing out that.

Regards
Bejoy.KS

On Thu, Nov 24, 2011 at 8:31 PM, Alexander C.H. Lorenz <
[email protected]> wrote:

> Hi,
>
> a typo?
> import com.bejoy.sampels.worcount.WordCountDriver;
> = wor_d_count ?
>
> - alex
>
> On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks <[email protected]> wrote:
>
> > Hi Denis
> >       I tried your code with out distributed cache locally and it worked
> > fine for me. Please find it at
> > http://pastebin.com/ki175YUx
> >
> > I echo Mike's words in submitting a map reduce jobs remotely. The remote
> > machine can be your local PC or any utility server as Mike specified.
> What
> > you need to have in remote machine is a replica of hadoop jars and
> > configuration files same as that of your hadoop cluster. (If you don't
> have
> > a remote util server set up then you can use your dev machine for the
> > same). Just trigger the hadoop job  on local machine and the actual job
> > would be submitted and running on your cluster based on the NN host and
> > configuration parameters you have on your config files.
> >
> > Hope it helps!..
> >
> > Regards
> > Bejoy.K.S
> >
> > On Thu, Nov 24, 2011 at 7:09 PM, Michel Segel <[email protected]
> > >wrote:
> >
> > > Denis...
> > >
> > > Sorry, you lost me.
> > >
> > > Just to make sure we're using the same terminology...
> > > The cluster is comprised of two types of nodes...
> > > The data nodes which run DN,TT, and if you have HBase, RS.
> > > Then there are control nodes which run you NN,SN, JT and if you run
> > HBase,
> > > HM and ZKs ...
> > >
> > > Outside of the cluster we have machines set up with Hadoop installed
> but
> > > are not running any of the processes. They are where our users launch
> > there
> > > jobs. We call them edge nodes. ( it's not a good idea to let users
> > directly
> > > on the actual cluster.)
> > >
> > > Ok, having said all of that... You launch you job from the edge
> nodes...
> > > Your data sits in HDFS so you don't need distributed cache at all. Does
> > > that make sense?
> > > You job will run on the local machine, connect to the JT and then run.
> > >
> > > We set up the edge nodes so that all of the jars, config files are
> > already
> > > set up for the users and we can better control access...
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > > On Nov 24, 2011, at 7:22 AM, Denis Kreis <[email protected]> wrote:
> > >
> > > > Without using the distributed cache i'm getting the same error. It's
> > > > because i start the job from a remote client / programmatically
> > > >
> > > > 2011/11/24 Michel Segel <[email protected]>:
> > > >> Silly question... Why do you need to use the distributed cache for
> the
> > > word count program?
> > > >>  What are you trying to accomplish?
> > > >>
> > > >> I've only had to play with it for one project where we had to push
> out
> > > a bunch of c++ code to the nodes as part of a job...
> > > >>
> > > >> Sent from a remote device. Please excuse any typos...
> > > >>
> > > >> Mike Segel
> > > >>
> > > >> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[email protected]>
> wrote:
> > > >>
> > > >>> Hi Bejoy
> > > >>>
> > > >>> 1. Old API:
> > > >>> The Map and Reduce classes are the same as in the example, the main
> > > >>> method is as follows
> > > >>>
> > > >>> public static void main(String[] args) throws IOException,
> > > >>> InterruptedException {
> > > >>>        UserGroupInformation ugi =
> > > >>> UserGroupInformation.createProxyUser("<remote user name>",
> > > >>> UserGroupInformation.getLoginUser());
> > > >>>        ugi.doAs(new PrivilegedExceptionAction<Void>() {
> > > >>>            public Void run() throws Exception {
> > > >>>
> > > >>>                JobConf conf = new JobConf(WordCount.class);
> > > >>>                conf.setJobName("wordcount");
> > > >>>
> > > >>>                conf.setOutputKeyClass(Text.class);
> > > >>>                conf.setOutputValueClass(IntWritable.class);
> > > >>>
> > > >>>                conf.setMapperClass(Map.class);
> > > >>>                conf.setCombinerClass(Reduce.class);
> > > >>>                conf.setReducerClass(Reduce.class);
> > > >>>
> > > >>>                conf.setInputFormat(TextInputFormat.class);
> > > >>>                conf.setOutputFormat(TextOutputFormat.class);
> > > >>>
> > > >>>                FileInputFormat.setInputPaths(conf, new Path("<path
> to
> > > input dir>"));
> > > >>>                FileOutputFormat.setOutputPath(conf, new Path("<path
> > to
> > > >>> output dir>"));
> > > >>>
> > > >>>                conf.set("mapred.job.tracker", "<ip:8021>");
> > > >>>
> > > >>>                FileSystem fs = FileSystem.get(new
> > > URI("hdfs://<ip>:8020"),
> > > >>> new Configuration());
> > > >>>                fs.mkdirs(new Path("<remote path>"));
> > > >>>                fs.copyFromLocalFile(new Path("<local
> > path>/test.jar"),
> > > new
> > > >>> Path("<remote path>"));
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>

Re: Issue with DistributedCache

Reply via email to