I have only ever used the distributed cache to add files, including binary
files such as shared libraries.
It looks like you are adding a directory.

The DistributedCache is not generally used for passing data, but for passing
file names.
The files must be stored in a shared file system (hdfs for simplicity)
already.

The distributed cache makes the names available to the tasks, and the the
files are extracted from hdfs and stored in the task local work area on each
task tracker node.
It looks like you may be storing the contents of your files in the
distributed cache.

On Wed, Jun 17, 2009 at 6:56 AM, akhil1988 <akhilan...@gmail.com> wrote:

>
> Thanks Jason.
>
> I went inside the code of the statement and found out that it eventually
> makes some binaryRead function call to read a binary file and there it
> strucks.
>
> Do you know whether there is any problem in giving a binary file for
> addition to the distributed cache.
> In the statement DistributedCache.addCacheFile(new
> URI("/home/akhil1988/Ner/OriginalNer/Data/"), conf); Data is a directory
> which contains some text as well as some binary files. In the statement
> Parameters.readConfigAndLoadExternalData("Config/allLayer1.config"); I can
> see(in the output messages) that it is able to read the text files but it
> gets struck at the binary files.
>
> So, I think here the problem is: it is not able to read the binary files
> which either have not been transferred to the cache or a binary file cannot
> be read.
>
> Do you know the solution to this?
>
> Thanks,
> Akhil
>
>
> jason hadoop wrote:
> >
> > Something is happening inside of your (Parameters.
> > readConfigAndLoadExternalData("Config/allLayer1.config");)
> > code, and the framework is killing the job for not heartbeating for 600
> > seconds
> >
> > On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 <akhilan...@gmail.com> wrote:
> >
> >>
> >> One more thing, finally it terminates there (after some time) by giving
> >> the
> >> final Exception:
> >>
> >> java.io.IOException: Job failed!
> >>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> >>         at LbjTagger.NerTagger.main(NerTagger.java:109)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>        at
> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> >>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> >>
> >>
> >> akhil1988 wrote:
> >> >
> >> > Thank you Jason for your reply.
> >> >
> >> > My Map class is an inner class and it is a static class. Here is the
> >> > structure of my code.
> >> >
> >> > public class NerTagger {
> >> >
> >> >         public static class Map extends MapReduceBase implements
> >> > Mapper<LongWritable, Text, Text, Text>{
> >> >                 private Text word = new Text();
> >> >                 private static NETaggerLevel1 tagger1 = new
> >> > NETaggerLevel1();
> >> >                 private static NETaggerLevel2 tagger2 = new
> >> > NETaggerLevel2();
> >> >
> >> >                 Map(){
> >> >                         System.out.println("HI2\n");
> >> >
> >> > Parameters.readConfigAndLoadExternalData("Config/allLayer1.config");
> >> >                         System.out.println("HI3\n");
> >> >
> >> > Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean("true");
> >> >
> >> >                         System.out.println("loading the tagger");
> >> >
> >> >
> >>
> tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+".level1");
> >> >                         System.out.println("HI5\n");
> >> >
> >> >
> >>
> tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+".level2");
> >> >                         System.out.println("Done- loading the
> tagger");
> >> >                 }
> >> >
> >> >                 public void map(LongWritable key, Text value,
> >> > OutputCollector<Text, Text> output, Reporter reporter ) throws
> >> IOException
> >> > {
> >> >                         String inputline = value.toString();
> >> >
> >> >                         /* Processing of the input pair is done here
> */
> >> >                 }
> >> >
> >> >
> >> > public static void main(String [] args) throws Exception {
> >> >                 JobConf conf = new JobConf(NerTagger.class);
> >> >                 conf.setJobName("NerTagger");
> >> >
> >> >                 conf.setOutputKeyClass(Text.class);
> >> >                 conf.setOutputValueClass(IntWritable.class);
> >> >
> >> >                 conf.setMapperClass(Map.class);
> >> >                 conf.setNumReduceTasks(0);
> >> >
> >> >                 conf.setInputFormat(TextInputFormat.class);
> >> >                 conf.setOutputFormat(TextOutputFormat.class);
> >> >
> >> >                 conf.set("mapred.job.tracker", "local");
> >> >                 conf.set("fs.default.name", "file:///");
> >> >
> >> >                 DistributedCache.addCacheFile(new
> >> > URI("/home/akhil1988/Ner/OriginalNer/Data/"), conf);
> >> >                 DistributedCache.addCacheFile(new
> >> > URI("/home/akhil1988/Ner/OriginalNer/Config/"), conf);
> >> >                 DistributedCache.createSymlink(conf);
> >> >
> >> >
> >> >                 conf.set("mapred.child.java.opts","-Xmx4096m");
> >> >
> >> >                 FileInputFormat.setInputPaths(conf, new
> Path(args[0]));
> >> >                 FileOutputFormat.setOutputPath(conf, new
> >> Path(args[1]));
> >> >
> >> >                 System.out.println("HI1\n");
> >> >
> >> >                 JobClient.runJob(conf);
> >> >         }
> >> >
> >> > Jason, when the program executes HI1 and HI2 are printed but it does
> >> not
> >> > reaches HI3. In the statement
> >> > Parameters.readConfigAndLoadExternalData("Config/allLayer1.config");
> it
> >> is
> >> > able to access Config/allLayer1.config file (as while executing this
> >> > statement, it prints some messages like which data it is loading,
> etc.)
> >> > but it gets stuck there(while loading some classifier) and never
> >> reaches
> >> > HI3.
> >> >
> >> > This program runs fine when executed normally(without mapreduce).
> >> >
> >> > Thanks, Akhil
> >> >
> >> >
> >> >
> >> >
> >> > jason hadoop wrote:
> >> >>
> >> >> Is it possible that your map class is an inner class and not static?
> >> >>
> >> >> On Tue, Jun 16, 2009 at 10:51 AM, akhil1988 <akhilan...@gmail.com>
> >> wrote:
> >> >>
> >> >>>
> >> >>> Hi All,
> >> >>>
> >> >>> I am running my mapred program in local mode by setting
> >> >>> mapred.jobtracker.local to local mode so that I can debug my code.
> >> >>> The mapred program is a direct porting of my original sequential
> >> code.
> >> >>> There
> >> >>> is no reduce phase.
> >> >>> Basically, I have just put my program in the map class.
> >> >>>
> >> >>> My program takes around 1-2 min. in instantiating the data objects
> >> which
> >> >>> are
> >> >>> present in the constructor of Map class(it loads some data model
> >> files,
> >> >>> therefore it takes some time). After the instantiation part in the
> >> >>> constrcutor of Map class the map function is supposed to process the
> >> >>> input
> >> >>> split.
> >> >>>
> >> >>> The problem is that the data objects do not get instantiated
> >> completely
> >> >>> and
> >> >>> in between(whlie it is still in constructor) the program stops
> giving
> >> >>> the
> >> >>> exceptions pasted at bottom.
> >> >>> The program runs fine without mapreduce and does not require more
> >> than
> >> >>> 2GB
> >> >>> memory, but in mapreduce even after doing export
> >> HADOOP_HEAPSIZE=2500(I
> >> >>> am
> >> >>> working on machines with 16GB RAM), the program fails. I have also
> >> set
> >> >>> HADOOP_OPTS="-server -XX:-UseGCOverheadLimit" as sometimes I was
> >> getting
> >> >>> GC
> >> >>> Overhead Limit Exceeded exceptions also.
> >> >>>
> >> >>> Somebody, please help me with this problem: I have trying to debug
> it
> >> >>> for
> >> >>> the last 3 days, but unsuccessful. Thanks!
> >> >>>
> >> >>> java.lang.OutOfMemoryError: Java heap space
> >> >>>        at
> >> >>>
> sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:889)
> >> >>>        at java.lang.Double.toString(Double.java:179)
> >> >>>        at java.text.DigitList.set(DigitList.java:272)
> >> >>>        at java.text.DecimalFormat.format(DecimalFormat.java:584)
> >> >>>        at java.text.DecimalFormat.format(DecimalFormat.java:507)
> >> >>>        at java.text.NumberFormat.format(NumberFormat.java:269)
> >> >>>        at
> >> >>>
> >> org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:110)
> >> >>>        at
> >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1147)
> >> >>>        at LbjTagger.NerTagger.main(NerTagger.java:109)
> >> >>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >>>        at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> >> >>>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> >> >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >> >>>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> >> >>>
> >> >>> 09/06/16 12:34:41 WARN mapred.LocalJobRunner: job_local_0001
> >> >>> java.lang.RuntimeException:
> >> java.lang.reflect.InvocationTargetException
> >> >>>        at
> >> >>>
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
> >> >>>        at
> >> >>> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >> >>>        at
> >> >>>
> >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >> >>>        at
> >> >>>
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >> >>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> >> >>>        at
> >> >>>
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> >> >>> Caused by: java.lang.reflect.InvocationTargetException
> >> >>>        at
> >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> >> >>> Method)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >> >>>        at
> >> >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >> >>>        at
> >> >>>
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:79)
> >> >>>        ... 5 more
> >> >>> Caused by: java.lang.ThreadDeath
> >> >>>        at java.lang.Thread.stop(Thread.java:715)
> >> >>>        at
> >> >>>
> >> org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
> >> >>>        at
> >> >>>
> >>
> org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
> >> >>>        at
> >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1224)
> >> >>>        at LbjTagger.NerTagger.main(NerTagger.java:109)
> >> >>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >>>        at
> >> >>>
> >> >>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >>>        at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> >> >>>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> >> >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >> >>>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> >> >>>
> >> >>> --
> >> >>> View this message in context:
> >> >>>
> >>
> http://www.nabble.com/Nor-%22OOM-Java-Heap-Space%22-neither-%22GC-OverHead-Limit-Exeeceded%22-tp24059508p24059508.html
> >> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> >> >> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> >> >> www.prohadoopbook.com a community for Hadoop Professionals
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Nor-%22OOM-Java-Heap-Space%22-neither-%22GC-OverHead-Limit-Exeeceded%22-tp24059508p24066426.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Nor-%22OOM-Java-Heap-Space%22-neither-%22GC-OverHead-Limit-Exeeceded%22-tp24059508p24074211.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to