Re: Issue with DistributedCache

Alexander C.H. Lorenz Thu, 24 Nov 2011 07:01:37 -0800

Hi,

a typo?
import com.bejoy.sampels.worcount.WordCountDriver;
= wor_d_count ?


- alex

On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks <[email protected]> wrote:

> Hi Denis
>       I tried your code with out distributed cache locally and it worked
> fine for me. Please find it at
> http://pastebin.com/ki175YUx
>
> I echo Mike's words in submitting a map reduce jobs remotely. The remote
> machine can be your local PC or any utility server as Mike specified. What
> you need to have in remote machine is a replica of hadoop jars and
> configuration files same as that of your hadoop cluster. (If you don't have
> a remote util server set up then you can use your dev machine for the
> same). Just trigger the hadoop job  on local machine and the actual job
> would be submitted and running on your cluster based on the NN host and
> configuration parameters you have on your config files.
>
> Hope it helps!..
>
> Regards
> Bejoy.K.S
>
> On Thu, Nov 24, 2011 at 7:09 PM, Michel Segel <[email protected]
> >wrote:
>
> > Denis...
> >
> > Sorry, you lost me.
> >
> > Just to make sure we're using the same terminology...
> > The cluster is comprised of two types of nodes...
> > The data nodes which run DN,TT, and if you have HBase, RS.
> > Then there are control nodes which run you NN,SN, JT and if you run
> HBase,
> > HM and ZKs ...
> >
> > Outside of the cluster we have machines set up with Hadoop installed but
> > are not running any of the processes. They are where our users launch
> there
> > jobs. We call them edge nodes. ( it's not a good idea to let users
> directly
> > on the actual cluster.)
> >
> > Ok, having said all of that... You launch you job from the edge nodes...
> > Your data sits in HDFS so you don't need distributed cache at all. Does
> > that make sense?
> > You job will run on the local machine, connect to the JT and then run.
> >
> > We set up the edge nodes so that all of the jars, config files are
> already
> > set up for the users and we can better control access...
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Nov 24, 2011, at 7:22 AM, Denis Kreis <[email protected]> wrote:
> >
> > > Without using the distributed cache i'm getting the same error. It's
> > > because i start the job from a remote client / programmatically
> > >
> > > 2011/11/24 Michel Segel <[email protected]>:
> > >> Silly question... Why do you need to use the distributed cache for the
> > word count program?
> > >>  What are you trying to accomplish?
> > >>
> > >> I've only had to play with it for one project where we had to push out
> > a bunch of c++ code to the nodes as part of a job...
> > >>
> > >> Sent from a remote device. Please excuse any typos...
> > >>
> > >> Mike Segel
> > >>
> > >> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[email protected]> wrote:
> > >>
> > >>> Hi Bejoy
> > >>>
> > >>> 1. Old API:
> > >>> The Map and Reduce classes are the same as in the example, the main
> > >>> method is as follows
> > >>>
> > >>> public static void main(String[] args) throws IOException,
> > >>> InterruptedException {
> > >>>        UserGroupInformation ugi =
> > >>> UserGroupInformation.createProxyUser("<remote user name>",
> > >>> UserGroupInformation.getLoginUser());
> > >>>        ugi.doAs(new PrivilegedExceptionAction<Void>() {
> > >>>            public Void run() throws Exception {
> > >>>
> > >>>                JobConf conf = new JobConf(WordCount.class);
> > >>>                conf.setJobName("wordcount");
> > >>>
> > >>>                conf.setOutputKeyClass(Text.class);
> > >>>                conf.setOutputValueClass(IntWritable.class);
> > >>>
> > >>>                conf.setMapperClass(Map.class);
> > >>>                conf.setCombinerClass(Reduce.class);
> > >>>                conf.setReducerClass(Reduce.class);
> > >>>
> > >>>                conf.setInputFormat(TextInputFormat.class);
> > >>>                conf.setOutputFormat(TextOutputFormat.class);
> > >>>
> > >>>                FileInputFormat.setInputPaths(conf, new Path("<path to
> > input dir>"));
> > >>>                FileOutputFormat.setOutputPath(conf, new Path("<path
> to
> > >>> output dir>"));
> > >>>
> > >>>                conf.set("mapred.job.tracker", "<ip:8021>");
> > >>>
> > >>>                FileSystem fs = FileSystem.get(new
> > URI("hdfs://<ip>:8020"),
> > >>> new Configuration());
> > >>>                fs.mkdirs(new Path("<remote path>"));
> > >>>                fs.copyFromLocalFile(new Path("<local
> path>/test.jar"),
> > new
> > >>> Path("<remote path>"));
> > >>>
> > >>>
> > >>
> > >
> >
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you
really need to.*

Re: Issue with DistributedCache

Reply via email to