[
https://issues.apache.org/jira/browse/HBASE-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632472#comment-13632472
]
Jean-Marc Spaggiari commented on HBASE-8318:
--------------------------------------------
bq. So I don't see how we have "almost the same issue".
Let's take an example. You have billions of lines representing users actions.
Each night, you have a MR job which is doing some statistics for those users,
per user (one line per user). In your table you have 30 versions. So you can
have a monthly stat by retreiving all the versions for a specific user. You
know the date, so you can display the result by date starting from today, to
d-30.
Now, if your task is failing, it might still have put some data in the table.
So for some users, the 30 versions are no more representing the same dates as
for some other users, since some jobs are going to be re-run, which will do
another Put, will push the versions by one, and will corrupt your results.
You will end with something like that:
+-------+------+------+------+------+------+------+------+
| Key | Ver0 | Ver1 | Ver2 | Ver3 | Ver4 | Ver5 | Ver6 |
+-------+------+------+------+------+------+------+------+
| User1 | t0 | t1 | t2 | t3 | t4 | t5 | t6 | <= Everything went
well for this user.
+-------+------+------+------+------+------+------+------+
| User2 | t0 | t0 | t1 | t2 | t3 | t4 | t5 | <= Task failed for
this one but data was already writtent into the table. So task wrote the value
t0 twice.
+-------+------+------+------+------+------+------+------+
| User3 | t0 | t1 | t2 | t3 | t4 | t5 | t6 | <= Everything went
well for this user.
+-------+------+------+------+------+------+------+------+
Data at v0 is correct, but for User2, other version are not more alligned as
expected.
So there is still some usecases where Puts can generate some corrupted results
too.
Regarding the way to communicate that to the user, I agree that javadoc might
be a bit light since users will probably use the JARs and not really look at
the code. Maybe we can add something about Increments and Puts into the online
documentation too? Some debug warnings logs might be useful to? Or do you have
any other recommandation?
> TableOutputFormat.TableRecordWriter should accept Increments
> ------------------------------------------------------------
>
> Key: HBASE-8318
> URL: https://issues.apache.org/jira/browse/HBASE-8318
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Marc Spaggiari
> Assignee: Jean-Marc Spaggiari
> Attachments: HBASE-8318-v0-trunk.patch, HBASE-8318-v1-trunk.patch,
> HBASE-8318-v2-trunk.patch
>
>
> TableOutputFormat.TableRecordWriter can take Puts and Deletes but it should
> also accept Increments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira