[ 
https://issues.apache.org/jira/browse/HBASE-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632472#comment-13632472
 ] 

Jean-Marc Spaggiari commented on HBASE-8318:
--------------------------------------------

bq. So I don't see how we have "almost the same issue".

Let's take an example. You have billions of lines representing users actions. 
Each night, you have a MR job which is doing some statistics for those users, 
per user (one line per user). In your table you have 30 versions. So you can 
have a monthly stat by retreiving all the versions for a specific user. You 
know the date, so you can display the result by date starting from today, to 
d-30.

Now, if your task is failing, it might still have put some data in the table. 
So for some users, the 30 versions are no more representing the same dates as 
for some other users, since some jobs are going to be re-run, which will do 
another Put, will push the versions by one, and will corrupt your results.

You will end with something like that:

+-------+------+------+------+------+------+------+------+
|  Key  | Ver0 | Ver1 | Ver2 | Ver3 | Ver4 | Ver5 | Ver6 |
+-------+------+------+------+------+------+------+------+
| User1 |  t0  |  t1  |  t2  |  t3  |  t4  |  t5  |  t6  | <= Everything went 
well for this user.
+-------+------+------+------+------+------+------+------+
| User2 |  t0  |  t0  |  t1  |  t2  |  t3  |  t4  |  t5  | <= Task failed for 
this one but data was already writtent into the table. So task wrote the value 
t0 twice.
+-------+------+------+------+------+------+------+------+
| User3 |  t0  |  t1  |  t2  |  t3  |  t4  |  t5  |  t6  | <= Everything went 
well for this user.
+-------+------+------+------+------+------+------+------+

Data at v0 is correct, but for User2, other version are not more alligned as 
expected.

So there is still some usecases where Puts can generate some corrupted results 
too.

Regarding the way to communicate that to the user, I agree that javadoc might 
be a bit light since users will probably use the JARs and not really look at 
the code. Maybe we can add something about Increments and Puts into the online 
documentation too? Some debug warnings logs might be useful to? Or do you have 
any other recommandation?
                
> TableOutputFormat.TableRecordWriter should accept Increments
> ------------------------------------------------------------
>
>                 Key: HBASE-8318
>                 URL: https://issues.apache.org/jira/browse/HBASE-8318
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>         Attachments: HBASE-8318-v0-trunk.patch, HBASE-8318-v1-trunk.patch, 
> HBASE-8318-v2-trunk.patch
>
>
> TableOutputFormat.TableRecordWriter can take Puts and Deletes but it should 
> also accept Increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to