Jim Kellerman <[EMAIL PROTECTED]> writes: > > I'm not sure what you mean by server, > but any particular row is only served > by one HBase server. Multiple clients can > submit batch updates for the > same row and they will all be handled > by a single HBase server. >
When I say server, I actually mean machine. There could be multiple clients running on different machines. There could be cases that two clients submit batch updates of same row to the same HBase server at the same time. Then at the HBase Server, the batch updates from one client would execute first. The other would wait for all of them to be finished, rather than return an exception. Is that right? > > Each of them have one BatchUpdate class of their own. I doubt > > it would still cause the "update in progress" exception. > > In 0.16 (and also in the hbase-0.1.x releases) the client API > supports only one batch update operation at a time. So if a single > thread did two startUpdate calls or if multiple threads did a > single startUpdate call, you will get the "update in progress" > exception. > > This has changed in HBase trunk. A single thread or multiple > threads can create a separate BatchUpdate object for each row > they want to update. When all the changes have been added to > the BatchUpdate, it is sent to the server by calling > HTable.commit(BatchUpdate) > I misunderstood the reason of the "update in progress" exception before. I thought it does not allow two startUpdate calls on the same row simultaneously. In fact, as you has explained, it does not allow two startUpdate calls on any rows simultaneously. > > Not sure I understand the problem. The updates collected in > a BatchUpdate are sent via a single RPC call. The row gets > locked on the server and each update is written to the redo > log before it is cached. When the cache fills it is flushed > to disk. If the server crashes before the cache is flushed, > the data can be recovered from the redo log. > So at the client side, commit operation returns after RPC call to server has returned. At the time that commit returns, redo logs has already been written to the disk. Am I right? If that is true, there is no problem of Durability any more. > > BatchUpdate would not work at lest for massive size of data > > or high load. > > Actually it works pretty well. We have several applications that > have tens of millions of rows on 10 to 20 servers that are storing > tens of gigabytes of data currently. > > One user loaded 1.3 billion rows into HBase as a test. > The misunderstood of how BatchUpdate class works direct me to that argument. Glad that I'm wrong. > > I hope HBase could fix the problem in the near future. > > It is fixed in hbase trunk which has not yet been released. > > > Is any version of HBase allows concurrent updates while what > > we need to do is only type table.commit(id)? > > There is no released version that supports this. It is only > in hbase trunk which will be released as hbase-0.2.0 in a > few weeks. > > By the way, you know that HBase is now a subproject of > Hadoop and now has a separate svn repository? All development > of hbase-0.1.x and hbase-trunk happens there and not in > the hadoop svn. You can find the hbase source at: > > http://svn.apache.org/repos/asf/hadoop/hbase > I am currently doing research on hosting web applications data in non-relational DBMS. For web application, concurrent access to data happens a lot! I really need the concurrent update feature to host a scalable web application on HBase. I would be looking forward for the release of hbase-0.2.0 Now I will try current trunk version first. Thanks for the explanation. It helps me a lot!
