On Tue, Jun 2, 2009 at 4:51 PM, Guilherme Germoglio <[email protected]>wrote:
> Hello! > > On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <[email protected]> > wrote: > > > Hi! > > > > On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio < > [email protected] > > >wrote: > > > > > Hi Erik, > > > > > > For now, I'm using checkAndSave in order to make sure that a row is > only > > > created but not overwritten by multiple threads. So, checkAndSave is > > mostly > > > invoked with a new structure created on the client. Actually, I'm > > checking > > > if a specific "deleted" column in empty. If the "deleted" column is not > > > empty, then the row creation cannot be performed. There are another few > > > tricky cases I'm using it, but I'm sure that making that Result object > > more > > > difficult to create than putting values on a map would be bad for me. > :-) > > > > So you have a row with family and qualifier that you check to see if it > is > > empty > > and if it is you insert a new row? So basically you use it as an atomic > > rowExist > > checker or? Are you usually batching this checks or would it be ok with > > something like: > > > > public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier, > > byte[] value, Put put){} > > or > > public boolean checkAndPut(KeyValue checkKv, Put put){} > > for now? > > > > Yes. It is ok for me to use the methods above for now. Sweet, will make a version today, so you can test it out and maybe after that we can work on it together to make things work for you. > > > Just in case you are curious on how I'll be using them, there are two cases > where I'm using checkAndSave: > > The first is like the atomic rowExist checker and it represents 90% of the > use of checkAndSave. Exactly as you said, I've got a column > attributes:deleted for every row. When creating a new row, the creation > only > happens if this column is empty. When the row creation happens, it is > assigned a 'false' value to this column. When this column receives a 'true' > value, that is, the row is to be deleted, the 'hard' removal (a HTable's > Delete) of the row will be performed asynchronously. Until the 'hard' > removal happens, a software layer that uses HTable will prevent the use of > any 'soft' deleted row by checking the attributes:deleted column. > > The second case of using checkAndSave is to trigger some actions when a > specific column is updated. So, I don't check for emptiness, but if a > previous value continues the same when I'm updating the row. For example, > let's say I have a users table where I will serialize a User object and put > it into a row. Among other things, the User object contains an e-mail > attribute and its change must trigger verification actions, changes on > other > tables, whatever. I realized that performing a get for every User update > just to check whether their e-mail changed or not might not be the better > approach, since changing e-mail is not a very common operation. So, I > thought it is better to checkAndSave an user expecting their current e-mail > value will be the same the one already in the table since this will occur > many many times more than the opposite. However, if it is the case that the > current e-mail value is different from the one in the table, triggers are > fired and then a new update is performed. > > > > > > > > > > > However, here's an idea. What if Put and Delete objects have a field > > > "condition" (maybe, "onlyIf" would be a better name) which is exactly > the > > > map with columns and expected values. So, a given Put or Delete of an > > > updates list will only happen if those expected values match. > > > > > > > Puts and deletes are pretty much just List<KeyValue> which is basically a > > List<byte[]>. > > I don't think that we want to add complexity for puts and deletes now > that > > we have worked > > so hard to make it faster and more bare bone. > > > > no problem. (sorry!) > You don't have to be sorry, just happy that we are going to have a faster HBase soon :) > > > > > > > > > Also, maybe it should be possible to indicate common expected values > for > > > all > > > updates of a list too, so a client won't have to put in all updates the > > > same > > > values if needed. But we must remember to solve the conflicts of > expected > > > values. > > > > > Not really sure if you mean that we would check the value of a key before > > inserting the new > > value? That would mean that you would have to do a get for every > put/delete > > which is not > > something we want in the general case. > > > > > > > > > > (By the way, I haven't seen the guts of new Puts and Deletes, so I > don't > > > know how difficult would it be to implement it -- but I can help, if > > > necessary) > > > > > > Thanks, > > > > > > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <[email protected]> > > > wrote: > > > > > > > Hi! > > > > I'm working on putting checkAndSave back into 0.20 and just want to > > check > > > > with the people that are using it how they are using it > > > > so that I can make it as good as possible for these users. > > > > > > > > Since the API has changed from earlier versions there are some things > > > that > > > > one need to think about. > > > > For now in the new API there are now Updates, just Put and Delete, so > > for > > > > now I need to know if users used to delete in the old batchUpdate > > > > or just put? > > > > > > > > The new return format Result might seem like a good way to send in > the > > > data > > > > to be used as "actual", but there is no super easy way to build that > > > > on the client side for now, so would be good to know how you are > doing > > > > this. > > > > If you do a get, save the result and then use it for the check or if > > you > > > > just create new structures on the client? > > > > > > > > Regards Erik > > > > > > > > > > > > > > > > -- > > > Guilherme > > > > > > msn: [email protected] > > > homepage: http://germoglio.googlepages.com > > > > > > > Regards Erik > > > > > > -- > Guilherme > > msn: [email protected] > homepage: http://germoglio.googlepages.com >
