[
https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell resolved HBASE-1969.
-----------------------------------
Resolution: Not a Problem
Reopen or file new issue if relevant for modern HBase versions
> HBASE-1626 does not work as advertised due to lack of "instanceof" check in
> MR framework
> ----------------------------------------------------------------------------------------
>
> Key: HBASE-1969
> URL: https://issues.apache.org/jira/browse/HBASE-1969
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete
> instances to the TableOutputFormat. So the explicit Put reference was changed
> to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id :
> attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected
> org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
> at
> com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type
> using "instanceof" but with a direct class comparison. In MapTask.java you
> find this code
> {code}
> public synchronized void collect(K key, V value, int partition
> ) throws IOException {
> reporter.progress();
> if (key.getClass() != keyClass) {
> throw new IOException("Type mismatch in key from map: expected "
> + keyClass.getName() + ", recieved "
> + key.getClass().getName());
> }
> if (value.getClass() != valClass) {
> throw new IOException("Type mismatch in value from map: expected "
> + valClass.getName() + ", recieved "
> + value.getClass().getName());
> }
> ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job
> and then hand in a Put or Delete! The test case TestMapReduce did not pick
> this up as it has this line in it
> {code}
> TableMapReduceUtil.initTableMapperJob(
> Bytes.toString(table.getTableName()), scan,
> ProcessContentsMapper.class, ImmutableBytesWritable.class,
> Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null)
> job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly
> in their code
> {code}job.setMapOutputValueClass(Put.class);{code}
>
> But the whole idea only seems feasable if a) the Hadoop class is amended to
> use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a
> combined class that represent a Put *and* a Delete - which seems somewhat
> wrong, but doable. It would only really find use in that context and would
> require the user to make use of it when calling context.write(). This is
> making things not easier to learn.
> Suggestions?
--
This message was sent by Atlassian JIRA
(v6.2#6252)