[
https://issues.apache.org/jira/browse/HBASE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728695#action_12728695
]
Lars George commented on HBASE-1626:
------------------------------------
Seems like there are two issues buried here, one is to be able to "generalize"
the class that is handed into the reduce phase. The other is how to access a
table. For the latter - correct me if I am wrong Doğacan - you seem to have
tackled the wrong end of the stick. Instead of extending TableReducer and make
use of a table in the IdentityTableReducer you leave that as is and simply add
a custom TableReducer that creates the the table in the "setup()" method, does
the put's etc. in the "reduce()" call and closes/flushes in the "cleanup()"
method.
In other words you do not need to do anything but create a simple job that uses
IdentityTableReducer together with TableOutputFormat - which takes care of the
table.put(). As long as I do not miss anything else that is pretty much what
you are doing. Use the TableMapReduceUtil class to set up the job and also the
name of the table etc.
The crucial part is abstracting the type of the class the reducer actually
receives, so instead of assuming a Put it should be a Delete as well if
possible. I think Stack has that down 100% in his patch. So his patch together
with using the above classes you are fine.
Question for Stack
{code}
+ if (value instanceof Put) this.table.put(new Put((Put)value));
+ else if (value instanceof Delete) this.table.delete(new
Delete((Delete)value));
{code}
why doing that and not
{code}
+ if (value instanceof Put) this.table.put((Put) value);
+ else if (value instanceof Delete) this.table.delete((Delete) value);
{code}
Just wondering if there is a reason to create a new object. Are the cached in
the framework and the object reference causes them to be modified before
written? They are already written to an intermediate during the map/reduce
cross over so they are already copies.
> Allow emitting Deletes out of new TableReducer
> ----------------------------------------------
>
> Key: HBASE-1626
> URL: https://issues.apache.org/jira/browse/HBASE-1626
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Lars George
> Fix For: 0.20.0
>
> Attachments: deletes.patch, table-reduce.patch
>
>
> Doğacan Güney (nutch) wants to emit Delete from TableReduce. Currently we
> only do Put.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.