Re: Issue with DistributedCache

Michel Segel Thu, 24 Nov 2011 05:47:26 -0800

Denis...

Sorry, you lost me.

Just to make sure we're using the same terminology...
The cluster is comprised of two types of nodes...
The data nodes which run DN,TT, and if you have HBase, RS.
Then there are control nodes which run you NN,SN, JT and if you run HBase, HM 
and ZKs ...

Outside of the cluster we have machines set up with Hadoop installed but are 
not running any of the processes. They are where our users launch there jobs. 
We call them edge nodes. ( it's not a good idea to let users directly on the 
actual cluster.)

Ok, having said all of that... You launch you job from the edge nodes... Your 
data sits in HDFS so you don't need distributed cache at all. Does that make 
sense?
You job will run on the local machine, connect to the JT and then run.

We set up the edge nodes so that all of the jars, config files are already set 
up for the users and we can better control access...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 24, 2011, at 7:22 AM, Denis Kreis <[email protected]> wrote:

> Without using the distributed cache i'm getting the same error. It's
> because i start the job from a remote client / programmatically
> 
> 2011/11/24 Michel Segel <[email protected]>:
>> Silly question... Why do you need to use the distributed cache for the word 
>> count program?
>>  What are you trying to accomplish?
>> 
>> I've only had to play with it for one project where we had to push out a 
>> bunch of c++ code to the nodes as part of a job...
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[email protected]> wrote:
>> 
>>> Hi Bejoy
>>> 
>>> 1. Old API:
>>> The Map and Reduce classes are the same as in the example, the main
>>> method is as follows
>>> 
>>> public static void main(String[] args) throws IOException,
>>> InterruptedException {
>>>        UserGroupInformation ugi =
>>> UserGroupInformation.createProxyUser("<remote user name>",
>>> UserGroupInformation.getLoginUser());
>>>        ugi.doAs(new PrivilegedExceptionAction<Void>() {
>>>            public Void run() throws Exception {
>>> 
>>>                JobConf conf = new JobConf(WordCount.class);
>>>                conf.setJobName("wordcount");
>>> 
>>>                conf.setOutputKeyClass(Text.class);
>>>                conf.setOutputValueClass(IntWritable.class);
>>> 
>>>                conf.setMapperClass(Map.class);
>>>                conf.setCombinerClass(Reduce.class);
>>>                conf.setReducerClass(Reduce.class);
>>> 
>>>                conf.setInputFormat(TextInputFormat.class);
>>>                conf.setOutputFormat(TextOutputFormat.class);
>>> 
>>>                FileInputFormat.setInputPaths(conf, new Path("<path to input 
>>> dir>"));
>>>                FileOutputFormat.setOutputPath(conf, new Path("<path to
>>> output dir>"));
>>> 
>>>                conf.set("mapred.job.tracker", "<ip:8021>");
>>> 
>>>                FileSystem fs = FileSystem.get(new URI("hdfs://<ip>:8020"),
>>> new Configuration());
>>>                fs.mkdirs(new Path("<remote path>"));
>>>                fs.copyFromLocalFile(new Path("<local path>/test.jar"), new
>>> Path("<remote path>"));
>>> 
>>> 
>> 
>

Re: Issue with DistributedCache

Reply via email to