[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651478#comment-13651478
 ] 

Sergey Shelukhin commented on HBASE-3787:
-----------------------------------------

There are some design questions on r. Perhaps we should flesh out the design 
before I make any major changes.

1) Should we add actual usage of nonceGroup/client ID?
We can do that. Depends also on (2). I will probably change the server manager 
to lump nonce group and nonce into array wrapper and store these in the map,
instead of using pair. Pair is simpler but worse, right now I only added it for 
forward compat.
Map of maps is pain to clean up without tricks or epic lock, I have added that 
for sequential nonces but I wonder if it's worth it for simple nonces.
Client ID, for now, will be produced from IP, process id, and thread id. It 
will be hashed to 8 bytes and written into nonceGroup.

2) Is 8 bytes enough to avoid collisions?
The answer is "maybe". It depends on the number of requests overall in the 
cluster and for how long we store nonces.
We can alleviate this by adding client ID I guess, which will make it 16 bytes, 
8 unique per client and 8 random.

3) What random should we use?
Java uses SecureRandom to generate UUIDs. We can use some other Random, they 
claim to produce uniformly distributed numbers.

4) Will too many nonces be stored?
If we keep nonces for an hour, and do 10k increments per second per server, we 
will have stored 36000000 nonces on a server.
With map overhead, 2 object overheads, 2 primitive longs and an enum value, 
it's probably in excess of 120 bytes per entry (without clientId). So yeah it's 
a lot of memory.
Time to store nonces is configurable, though, and with default retry setting as 
little as 5 minutes could provide sufficient safety.
With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is 
not totally horrible (especially for 10k QPS :)).
Some solutions were proposed in the r, such as storing the mutation creation 
time and rejecting after certain time.
However that relies on synchronized clocks, and also doesn't solve the problem 
in a sense that client has no idea about the original problem - should he retry?
What do you think?
If you think it's realistic workload I can rework the sequential nonce patch 
instead, and there nonces would be collapsed. If clientId is used and 
incorporates the region,
requests arriving for the same region will generally go to the same server for 
some time, and in sequential order so a lot can be collapsed.
However it will add complexity.
What do you think?

                
> Increment is non-idempotent but client retries RPC
> --------------------------------------------------
>
>                 Key: HBASE-3787
>                 URL: https://issues.apache.org/jira/browse/HBASE-3787
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.94.4, 0.95.2
>            Reporter: dhruba borthakur
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, 
> HBASE-3787-v1.patch, HBASE-3787-v2.patch
>
>
> The HTable.increment() operation is non-idempotent. The client retries the 
> increment RPC a few times (as specified by configuration) before throwing an 
> error to the application. This makes it possible that the same increment call 
> be applied twice at the server.
> For increment operations, is it better to use 
> HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
> to enhance the IPC module to make the RPC server correctly identify if the 
> RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to