OK. I found HTable#checkAndPut() perfectly works for me. Here is my final code (in Scala.)
Thanks Bruno for writing the blog article relating this topic. That was very informative. Outerthought :: HBase row locks http://outerthought.org/blog/blog/380-OTC.html =============================================== lazy val UniqueIndexQualifier = "unq".getBytes lazy val AbsenceMarker = Array[Byte]() // Empty byte array lazy val ExistenceMarker = Array[Byte](0x01) def insert(table: HTable, put: Put): Unit = { put.add(Family, UniqueIndexQualifier, ExistenceMarker) val succeeded = table.checkAndPut(put.getRow, Family, UniqueIndexQualifier, AbsenceMarker, put) if (! succeeded) { throw new DuplicateRowException("Tried to insert a duplicate row: " + Bytes.toString(put.getRow)) } } def update(table: HTable, put: Put): Unit = { val succeeded = table.checkAndPut(put.getRow, Family, UniqueIndexQualifier, ExistenceMarker, put) if (! succeeded) { throw new RowNotFoundException("Tried to update a non-existing row: " + Bytes.toString(put.getRow)) } } =============================================== Thanks, -- 河野 達也 Tatsuya Kawano (Mr.) Tokyo, Japan twitter: http://twitter.com/tatsuya6502 2010/5/1 Tatsuya Kawano <tatsuy...@snowcocoa.info>: > Thanks all for your responses; they are very helpful. > > 4/30/2010 Todd Lipcon <t...@cloudera.com>: >> Note that your solution is not correct in the case of failure, since the >> check and put are not atomic with each other. >> >> If your client or server fails between the ICV and the put, no other clients >> will be able to put the row, but there will be no data. > > I agree; my solution is a bit fragile. If I stick with this plan, I > could try to delete the counter after the put fails. However, it seems > the delete also won't work, because the possible cause of the put > failure can be network disruption or region server problem, etc.) So, > I'm going to have to leave some kind of failure log, so I can remove > the reserved key later by hand. > > > 4/30/2010 Guilherme Germoglio <germog...@gmail.com>: >> Can the keys be randomly generated or they must be incremental? Remember >> that you can achieve higher throughput if they are randomly generated since >> the insertions will possibly load all machines more evenly. >> >> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) >> and load balance over the cluster, > > 4/30/2010 Michael Segel <michael_se...@hotmail.com>: >> UUIDs wont clash. Especially if you're using version 5 which is a truncated >> SHA-1 hash of the UUID. > > Thanks for the info. Well, for my case, I'd like to use a combination > of the business data as the row key, so I can scan them. But, I'll > keep UUID option for other cases. > > > 4/30/2010 Guilherme Germoglio <germog...@gmail.com>: >> but if you are paranoid enough you can >> also check whether a row already exists by using >> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], >> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for >> an empty byte array values in a column that you can ensure it has always >> some value). > > So, checkAndPut() seems ideal for my case. I didn't realize I can use > it to check whether a row already exists. I'll give it a try! > > > Thanks, > Tatsuya > > -- > 河野 達也 > Tatsuya Kawano (Mr.) > Tokyo, Japan > > twitter: http://twitter.com/tatsuya6502 > > > > > > > 2010年4月30日5:09 Michael Segel <michael_se...@hotmail.com>: >> >> UUIDs wont clash. Especially if you're using version 5 which is a truncated >> SHA-1 hash of the UUID. >> >> >>> From: germog...@gmail.com >>> Date: Thu, 29 Apr 2010 13:58:42 -0300 >>> Subject: Re: Unique row ID constraint >>> To: hbase-user@hadoop.apache.org >>> >>> Hello Tatsuya, >>> >>> Can the keys be randomly generated or they must be incremental? Remember >>> that you can achieve higher throughput if they are randomly generated since >>> the insertions will possibly load all machines more evenly. >>> >>> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) >>> and load balance over the cluster, but if you are paranoid enough you can >>> also check whether a row already exists by using >>> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], >>> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for >>> an empty byte array values in a column that you can ensure it has always >>> some value). >>> >>> On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <t...@cloudera.com> wrote: >>> >>> > Hi Tatsuya, >>> > >>> > Note that your solution is not correct in the case of failure, since the >>> > check and put are not atomic with each other. >>> > >>> > If your client or server fails between the ICV and the put, no other >>> > clients >>> > will be able to put the row, but there will be no data. >>> > >>> > -Todd >>> > >>> > >>> > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <tatsuy...@snowcocoa.info >>> > >wrote: >>> > >>> > > Hi Stack and Ryan, >>> > > >>> > > Thanks for your advices. I knew using row lock wasn't ideal, but I >>> > > couldn't find an appropriate atomic operation to do Compare And Swap. >>> > > >>> > > So, thanks Stack for helping me to find it. I found >>> > > incrementColumnValue() atomic operation just works for me since it >>> > > automatically initializes the column value with 0 when the column >>> > > doesn't exist. I cat try to increment the column value by 1, and if it >>> > > returns 1, I can be sure that I'm the first one who has created the >>> > > column and row. >>> > > >>> > > So, my updated code is much simpler and now lock-free. >>> > > >>> > > =============================================== >>> > > def insert(table: HTable, put: Put): Unit = { >>> > > val count = table.incrementColumnValue(put.getRow, family, >>> > > uniqueQual, >>> > > 1) >>> > > >>> > > if (count == 1) { >>> > > table.put(put) >>> > > >>> > > } else { >>> > > throw new DuplicateRowException("Tried to insert a duplicate row: >>> > > " >>> > > + Bytes.toString(put.getRow)) >>> > > } >>> > > } >>> > > =============================================== >>> > > >>> > > Thanks, >>> > > Tatsuya >>> > > >>> > > >>> > > >>> > > 2010/4/29 Ryan Rawson <ryano...@gmail.com>: >>> > > > I would strongly discourage people from building on top of >>> > > > lockRow/unlockRow. The problem is if a row is not available, lockRow >>> > > > will hold a responder thread and you can end up with a deadlock >>> > > > because the lock holder won't be able to unlock. Sure the expiry >>> > > > system kicks in, but 60 seconds is kind of infinity in database terms >>> > > > :-) >>> > > > >>> > > > I would probably go with either ICV or CAS to build the tools you >>> > > > want. With CAS you can accomplish a lot of things locking >>> > > > accomplishes, but more efficiently. >>> > > > >>> > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <st...@duboce.net> wrote: >>> > > >> Would the incrementValue [1] work for this? >>> > > >> St.Ack >>> > > >> >>> > > >> 1. >>> > > >>> > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 >>> > > >> >>> > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano >>> > > >> <tatsuy...@snowcocoa.info> wrote: >>> > > >>> Hi, >>> > > >>> >>> > > >>> I'd like to implement unique row ID constraint (like the primary key >>> > > >>> constraint in RDBMS) in my application framework. >>> > > >>> >>> > > >>> Here is a code fragment from my current implementation (HBase >>> > > >>> 0.20.4rc) written in Scala. It works as expected, but is there any >>> > > >>> better (shorter) way to do this like checkAndPut()? I'd like to >>> > > >>> pass >>> > > >>> a single Put object to my function (method) rather than passing >>> > rowId, >>> > > >>> family, qualifier and value separately. I can't do this now because >>> > > >>> I >>> > > >>> have to give the rowLock object when I instantiate the Put. >>> > > >>> >>> > > >>> =============================================== >>> > > >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], >>> > > >>> qualifier: Array[Byte], value: >>> > > >>> Array[Byte]): Unit = { >>> > > >>> >>> > > >>> val get = new Get(rowId) >>> > > >>> >>> > > >>> val lock = table.lockRow(rowId) // will expire in one minute >>> > > >>> try { >>> > > >>> if (table.exists(get)) { >>> > > >>> throw new DuplicateRowException("Tried to insert a duplicate >>> > > row: " >>> > > >>> + Bytes.toString(rowId)) >>> > > >>> >>> > > >>> } else { >>> > > >>> val put = new Put(rowId, lock) >>> > > >>> put.add(family, qualifier, value) >>> > > >>> >>> > > >>> table.put(put) >>> > > >>> } >>> > > >>> >>> > > >>> } finally { >>> > > >>> table.unlockRow(lock) >>> > > >>> } >>> > > >>> >>> > > >>> } >>> > > >>> =============================================== >>> > > >>> >>> > > >>> Thanks, >>> > > >>> >>> > > >>> -- >>> > > >>> 河野 達也 >>> > > >>> Tatsuya Kawano (Mr.) >>> > > >>> Tokyo, Japan >>> > > >>> >>> > > >>> twitter: http://twitter.com/tatsuya6502 >>> > > >>> > >>> > >>> > >>> > -- >>> > Todd Lipcon >>> > Software Engineer, Cloudera >>> > >>> >>> >>> >>> -- >>> Guilherme >>> >>> msn: guigermog...@hotmail.com >>> homepage: http://sites.google.com/site/germoglio/