I looked at the implementation of Increment and CheckAndPut. There could be 
consistence issue. Maybe that is by design - for HBase application scenarios it 
is good enough. Just want to confirm with folks if that is the intention.

1.      Increment:
a.      Scenario. Increment call in client application will trigger an RPC call 
to region server with the increment value along with cell information. After 
region server increments the value successfully, it will try to return the 
value to the client application and at this point RPC fail to reach client due 
to network issue. So the client thinks the operation failed, but the server 
actually successfully increment the value. Now the client will try again and 
cause the value to be incremented again on the server side.
b.      For certain application scenarios, it isn't much of an issue. for 
example, a), get unique id. b) large volume of analytics data like "query hit 
count" can tolerate some inconsistence.  given the chance is quite low for this 
scenario to happen.

2.      Same for CheckAndPut.
a.      Scenario. The same as above, due to network failure the client and the 
server have different views whether the operation succeeds or not.
b.      The special case of "create new row when it doesn't exist" will work 
fine - if the CheckAndPut fails, the client will always try to go back and Get 
the value.

_____________________________________________
From: Ma, Ming
Sent: Thursday, June 09, 2011 12:22 AM
To: '[email protected]'
Subject: RE: Does Put support "don't put if row exists"?


It looks like there is a HBase API called checkAndPut. By setting the value to 
be "null", you can achieve "put only when the row+column family+column 
qualifier doesn't exist". Nice feature.

_____________________________________________
From: Ma, Ming
Sent: Wednesday, June 08, 2011 9:54 PM
To: [email protected]
Subject: Does Put support "don't put if row exists"?


Hi,

Maybe this has been asked before. I couldn't find much information on this.

We have an application where multiple instances across different machines could 
try to insert  a new row with the same row key into a global HBase table at the 
same time. If the row has been inserted by one instance, we don't want other 
instances insert it again; instead the other instances should try to Get the 
row after their Put fails with "already exists" error.

It is somewhat similar to https://issues.apache.org/jira/browse/HBASE-493 , but 
here we need HBase to check for row existence, compared to check for 
version/timestamp.

The insertion rate is low, say 100 requests / sec. One way to implement this is 
to do it outside HBase. We can have client application use zookeeper to create 
a lock named after row key. The program will look like this:

If (!Row.Get())
{
Zookeeper.lock()

// let us do checking again in case another instance has just inserted the same 
row
If (!Row.Get())
{
    // the row doesn't exist
     Row.Put();
}
Zookeeper.unlock()
}

Any suggestions?

Thanks.

Ming

Reply via email to