[jira] Commented: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

Lars George (JIRA) Tue, 24 Nov 2009 23:53:47 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782333#action_12782333
 ]


Lars George commented on HBASE-1969:
------------------------------------

I agree, I would probably call it "Mutation" (following terminology from the 
BigTable paper). Get is different and can be kept separate. With that we could 
kill many birds with one stone, fixing the MR issue here, be able to implement 
batch mutations and do atomic row mutations comprising Put and Delete 
operations. BTW, BigTable has that

{code}
// Open the table
Table *T = OpenOrDie("/bigtable/web/webtable");

// Write a new anchor and delete an old anchor
RowMutation r1(T, "com.cnn.www");
r1.Set("anchor:www.c-span.org", "CNN");
r1.Delete("anchor:www.abc.com");
Operation op;
Apply(&op, &r1);
{code}



> HBASE-1626 does not work as advertised due to lack of "instanceof" check in 
> MR framework
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1969
>                 URL: https://issues.apache.org/jira/browse/HBASE-1969
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete 
> instances to the TableOutputFormat. So the explicit Put reference was changed 
> to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : 
> attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected 
> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>         at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>         at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at 
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
>         at 
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type 
> using "instanceof" but with a direct class comparison. In MapTask.java you 
> find this code
> {code}
>     public synchronized void collect(K key, V value, int partition
>                                      ) throws IOException {
>       reporter.progress();
>       if (key.getClass() != keyClass) {
>         throw new IOException("Type mismatch in key from map: expected "
>                               + keyClass.getName() + ", recieved "
>                               + key.getClass().getName());
>       }
>       if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>       ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job 
> and then hand in a Put or Delete! The test case TestMapReduce did not pick 
> this up as it has this line in it
> {code}
>       TableMapReduceUtil.initTableMapperJob(
>         Bytes.toString(table.getTableName()), scan,
>         ProcessContentsMapper.class, ImmutableBytesWritable.class, 
>         Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null) 
> job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly 
> in their code 
> {code}job.setMapOutputValueClass(Put.class);{code}
>  
> But the whole idea only seems feasable if a) the Hadoop class is amended to 
> use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a 
> combined class that represent a Put *and* a Delete - which seems somewhat 
> wrong, but doable. It would only really find use in that context and would 
> require the user to make use of it when calling context.write(). This is 
> making things not easier to learn.
> Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

Reply via email to