[ 
https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-1969.
-----------------------------------

    Resolution: Not a Problem

Reopen or file new issue if relevant for modern HBase versions

> HBASE-1626 does not work as advertised due to lack of "instanceof" check in 
> MR framework
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1969
>                 URL: https://issues.apache.org/jira/browse/HBASE-1969
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete 
> instances to the TableOutputFormat. So the explicit Put reference was changed 
> to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : 
> attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected 
> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>         at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>         at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at 
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
>         at 
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type 
> using "instanceof" but with a direct class comparison. In MapTask.java you 
> find this code
> {code}
>     public synchronized void collect(K key, V value, int partition
>                                      ) throws IOException {
>       reporter.progress();
>       if (key.getClass() != keyClass) {
>         throw new IOException("Type mismatch in key from map: expected "
>                               + keyClass.getName() + ", recieved "
>                               + key.getClass().getName());
>       }
>       if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>       ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job 
> and then hand in a Put or Delete! The test case TestMapReduce did not pick 
> this up as it has this line in it
> {code}
>       TableMapReduceUtil.initTableMapperJob(
>         Bytes.toString(table.getTableName()), scan,
>         ProcessContentsMapper.class, ImmutableBytesWritable.class, 
>         Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null) 
> job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly 
> in their code 
> {code}job.setMapOutputValueClass(Put.class);{code}
>  
> But the whole idea only seems feasable if a) the Hadoop class is amended to 
> use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a 
> combined class that represent a Put *and* a Delete - which seems somewhat 
> wrong, but doable. It would only really find use in that context and would 
> require the user to make use of it when calling context.write(). This is 
> making things not easier to learn.
> Suggestions?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to