Re: Excpetion when combining Hadoop MapReduce with HBase

Holger Stenzhorn Wed, 07 Nov 2007 08:16:45 -0800

Michael,

The problem was NOT that I did not set the heap size correctly for theserver in the Hadoop options: it is currently set to 1024m...but I did NOT set the heap size for my test class correctly (i.e. notat all) and so the default heap size for Java on Windows was used.

Therefore setting the heap size correctly for the test class resulted inthe solution of the issue for the file at hand....but in any case, if such file grows larger and larger then at somepoint the described OOME will (most likely) occur again.

Now the problem would be that I cannot add any more heap space becauseof the restrictions inherent to the Windows JVM.Hence the sole solution would be to partition such a large file intosmaller chunked files before running the test class on it....or are there other (configuration) possibilities that I have overseenso far?

Now, in case you still need it, I can create and send you an hprof-filecreated by the profiler when running the test class...


Cheers,
Holger

Michael Stack wrote:

Your program looks innocuous enough. Does attaching w/ jmap andgetting a '-histo' dump tell you anything? Have you tried upping yourJVM heap size (HADOOP_OPTS)? There's a lot going if you have the FS,MR, and HBase all up and running in the one JVM.
St.Ack


Holger Stenzhorn wrote:
Hi,
I started to use Hadoop MapReduce together with HBase and experiencean exception when running a test case (using the 0.15 first and thenlatest SVN).My usage scenario is the following: I have a text file containing RDFtriples (of the form subject-relation-object) with one triple on eachline (e.g."<http://dblp.l3s.de/d2r/resource/publications/books/acm/kim95/AnnevelinkACFHK95><http://purl.org/dc/elements/1.1/creator><http://dblp.l3s.de/d2r/resource/authors/Jurgen_Annevelink>"). Agiven subject can appear several times with different arelation-object combination.Now I want to go through the file, put into the table the subject asrow key and all found triples for this key into the same row (i.e.(abbreviated) "triple:0 - "<A> <x> <B>", "<C> <y> <D>", etc.).
So I implemented the class attached below for this purpose and (fortesting purposes) ran it locally: It works fine until 44848 lines inmy current test file. If I add more lines (i.e. triples) then thewhole thing crashes with the following exception. Now are there waysaround of this (except for splitting up files in smaller files)?...or is this actually a bug in Hadoop?
07/11/06 22:44:21 INFO jvm.JvmMetrics: Initializing JVM Metrics withprocessName=JobTracker, sessionId=07/11/06 22:44:21 WARN mapred.JobClient: No job jar file set. Userclasses may not be found. See JobConf(Class) or JobConf#setJar(String).07/11/06 22:44:21 INFO mapred.FileInputFormat: Total input paths toprocess : 1
07/11/06 22:44:21 INFO mapred.JobClient: Running job: job_local_1
07/11/06 22:44:21 INFO mapred.MapTask: numReduceTasks: 1
07/11/06 22:44:22 INFO mapred.JobClient:  map 0% reduce 0%
07/11/06 22:44:23 WARN mapred.LocalJobRunner: job_local_1
java.lang.OutOfMemoryError: Java heap space
       at java.util.Arrays.copyOf(Arrays.java:2786)
atjava.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
       at java.io.DataOutputStream.write(DataOutputStream.java:90)
       at org.apache.hadoop.io.Text.write(Text.java:243)
atorg.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:347)
       at TriplesTest$TriplesTestMapper.map(TriplesTest.java:41)
       at TriplesTest$TriplesTestMapper.map(TriplesTest.java:32)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
atorg.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Exception in thread "main" java.io.IOException: Job failed!
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
       at TriplesTest.run(TriplesTest.java:106)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at TriplesTest.main(TriplesTest.java:112)

Cheers,
Holger

TriplesTest.java:
-----------------

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapred.TableOutputFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class TriplesTest extends Configured implements Tool {

 public static class TriplesTriplesTestMapper extends MapReduceBase
   implements Mapper<LongWritable, Text, Text, Text> {
         public void map(LongWritable key, Text value,
                   OutputCollector<Text, Text> output,
                   Reporter reporter) throws IOException {
     String[] temp = value.toString().split(">\\s+<");
     if (temp.length == 3) {
       output.collect(new Text(temp[0] + ">"), value);
     }
   }
 }

 public static class TriplesTestReducer extends MapReduceBase
   implements Reducer<Text, Text, Text, MapWritable> {
     public void reduce(Text key, Iterator<Text> values,
                      OutputCollector<Text, MapWritable> output,
                      Reporter reporter) throws IOException {
     int i = 0;
     while (values.hasNext()) {
       byte[] bytes = values.next().getBytes();
       MapWritable newValue = new MapWritable();
       newValue.put(new Text("triple:" + i++),
         new ImmutableBytesWritable(bytes));
       output.collect(key, newValue);
     }
   }
 }
public int run(String[] args) throws Exception { JobConfjobConf = new JobConf(getConf(), TriplesTest.class);
             jobConf.setJobName("triples");

   jobConf.setMapperClass(TriplesTriplesTestMapper.class);
   jobConf.setReducerClass(TriplesTriplesTestReducer.class);

   jobConf.setInputPath(new Path("c:/development/test/input"));
     jobConf.setInputFormat(TextInputFormat.class);
   jobConf.setOutputFormat(TableOutputFormat.class);

   jobConf.setOutputKeyClass(Text.class);
     jobConf.set("hbase.mapred.outputtable", "triples");
   jobConf.set("hbase.master", "local");
     JobClient.runJob(jobConf);
     return 0;
 }

 public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new TriplesTest(),args);
   System.exit(res);
 }

}

Re: Excpetion when combining Hadoop MapReduce with HBase

Reply via email to