Re: Map Reduce performance

peterramesh Thu, 25 Jun 2009 21:47:19 -0700

Hi Eric/Tim,

Thanks for your appreciable points.


I have updated the Mapper implementation removing Htable instance, as
follows

public static class InnerMapWithTOF extends MapReduceBase implements
                        Mapper<LongWritable, Text, ImmutableBytesWritable, 
BatchUpdate> {

                public void map(LongWritable key, Text value,
                                OutputCollector<ImmutableBytesWritable, 
BatchUpdate> output,
                                Reporter reporter) throws IOException {

                        String[] splits = value.toString().split("\t");
                        BatchUpdate bu = new BatchUpdate(splits[0]);

                        int j = 0;
                        while (j < HBaseTest.SNP_INFO_COLUMN_NAMES.length) {
                                bu.put(HBaseTest.SNP_FAMILY_NAMES[0]
                                                + 
HBaseTest.SNP_INFO_COLUMN_NAMES[j], new String(
                                                
splits[j].getBytes()).getBytes());
                                j++;
                        }

                        output
                                        .collect(new 
ImmutableBytesWritable(splits[0].getBytes()),
                                                        bu);
                }

        }

But,  in the able code I'm reading same key value
(HBaseTest.SNP_FAMILY_NAMES[0]  + HBaseTest.SNP_INFO_COLUMN_NAMES[j])
sequentially all column family for each record.  Is there any way to set it
in the JobConf object or etc..

This TableReduce implementation does insert the records into the HTable, as
follows

        public static class InnerReduceWithTOF extends MapReduceBase implements
                        TableReduce<ImmutableBytesWritable, BatchUpdate> {

                public void reduce(ImmutableBytesWritable key,
                                Iterator<BatchUpdate> value,
                                OutputCollector<ImmutableBytesWritable, 
BatchUpdate> output,
                                Reporter reporter) throws IOException {

                        while (value.hasNext()) {
                                output.collect(key, value.next());
                        }

                }
        }

and here is the Configuration..

                JobConf c = new JobConf(getConf(), MapReduceHBaseTest.class);
                c.setJobName("ConfMapReduce2");
                FileInputFormat.setInputPaths(c, new Path("snp.txt"));

                c.setMapperClass(InnerMapWithTOF.class);
                c.setReducerClass(InnerReduceWithTOF.class);
                c.setOutputFormat(TableOutputFormat.class);
                c.set(TableOutputFormat.OUTPUT_TABLE, "snp");

                c.setOutputKeyClass(ImmutableBytesWritable.class);
                c.setOutputValueClass(BatchUpdate.class);

                c.setMapOutputKeyClass(ImmutableBytesWritable.class);
                c.setMapOutputValueClass(BatchUpdate.class);

                int partitioner = c.getNumMapTasks();

                System.out.println(partitioner);
                System.out.println(c.getNumReduceTasks());

                TableMapReduceUtil.initTableReduceJob("snp",
                                InnerReduceWithTOF.class, c);

                JobClient.runJob(c);


TIA,
Ramesh
-- 
View this message in context: 
http://www.nabble.com/Map-Reduce-performance-tp24166190p24214918.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Map Reduce performance

Reply via email to