[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692563#comment-13692563
]
Enis Soztutar commented on HBASE-3787:
--------------------------------------
Sergey asked me to elaborate a bit more on my earlier candidate proposal. This
is still light on details, and just for some food for thought to be considered
for later.
The idea for this proposal will only work with append and increment type
operations, since it will be operation specific rather than a generic solution.
This also relies on assumptions that distributed counters are the main use case
for increment operation, and these counters are mostly written to and
less-frequently read.
We will introduce two KeyValue.Type's: Put_Inc and Put_App, and rely on cell
tags to keep nonces around. These sort before Puts. We can make the cell tag
nonce a part of sort order as well, if it is set (otherwise we can append nonce
to the row_key). With this we don't need any specific handling of nonces on the
write side, since writes with the same nonce will eclipse each other since they
will sort the same. Also we do not have to keep anything in memory, and regions
can be moved freely in between servers. Put_Inc and Put_App will not count
against version, so that we keep those around until they expire.
We can build a grouping KV scanner which collapses Put_Inc's with the
underlying Puts. Since every get is already a scan, when client wants to read
the value back, it is computed on the fly (until we see a base Put, the
versions will not increase, so we will keep on scanning and buffering up). On
compactions, we can also use this grouping to collapse nonces that have been
expired.
The data might be sorted as:
Put,r1,cf1:q1,ts3,val4
Put_Inc,r1,cf1:q1,ts2,val3 (tag:nonce)
Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce)
Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) => idempotent rpc, second try
Put,r1,cf1:q1,ts1,val1
Get -> will return val4.
Get (ts <= ts2) will return val3 + val2 + val1
> Increment is non-idempotent but client retries RPC
> --------------------------------------------------
>
> Key: HBASE-3787
> URL: https://issues.apache.org/jira/browse/HBASE-3787
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 0.94.4, 0.95.2
> Reporter: dhruba borthakur
> Assignee: Sergey Shelukhin
> Priority: Critical
> Fix For: 0.95.2
>
> Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch,
> HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch,
> HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch
>
>
> The HTable.increment() operation is non-idempotent. The client retries the
> increment RPC a few times (as specified by configuration) before throwing an
> error to the application. This makes it possible that the same increment call
> be applied twice at the server.
> For increment operations, is it better to use
> HConnectionManager.getRegionServerWithoutRetries()? Another option would be
> to enhance the IPC module to make the RPC server correctly identify if the
> RPC is a retry attempt and handle accordingly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira