Another question: I assume this will not work out of the box with deletes?

Deletes always cover all key values in the past (from their timestamps on 
backwards), so once a delete marker is placed there is no way to get back any 
of a puts it affects.

HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows 
(but needs to be enabled for a column family - I still think it should be the 
default, but anyway).

-- Lars

________________________________
From: Flavio Junqueira <[email protected]>
To: Daniel Gómez Ferro <[email protected]>
Cc: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]>; "[email protected]" <[email protected]>; Maysam 
Yabandeh <[email protected]>; Benjamin Reed <[email protected]>; Ivan 
Kelly <[email protected]>
Sent: Sunday, November 6, 2011 7:14 AM
Subject: Re: Omid: Transactional Support for HBase


A quick note on Omid for the ones following on github: the repository we will 
be working with is the fork under the Yahoo! account:


https://github.com/yahoo/omid/

-Flavio


On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:


>
>On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>
>Cool stuff Daniel,
>>
>
>Hi Lars,
>
>Thanks for the good points.
>
>
>
>>Was looking through the code a bit. Seems like you make a best effort to push 
>>as much of
>>the filtering of KVs of uncommitted transactions to HBase and then do some 
>>filtering on the client
>>not a bad approach. (I hope I didn't misunderstand the approach, only looked 
>>through the code for
>>1/2 hour or so).
>>
>
>Putting it more accurately, the uncommitted KVs are stored at HBase, but it is 
>the client's job to filter them using the commit information that it has 
>received from the status oracle. According to snapshot isolation guarantee, 
>all the versions that are inserted with a timestamp larger than the 
>transaction start timestamp must be ignored, which is done by setting the time 
>range on the client's get request sent to HBase. Since the uncommitted changes 
>of the aborted transactions are eventually removed from HBase, the client 
>rarely needs to fetch more than a version to reach a KV that is committed 
>before the transaction starts (the first property of snapshot isolation).
>
>
>>
>>One thing I was wondering: Why bookkeeper? Why not store the WAL itself in 
>>HBase? That way
>>you might not even need a separate server.
>>
>>Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they 
>>also do MVCC
>>on top of unaltered HBase/schema, although from reading that paper I get the 
>>impression that it
>>would not scale to scans touching many rows (which is where your client side 
>>filtering comes in).
>>
>
>
>Thanks for the link. We had seen the other paper of the same authors 
>(Grid2010) that shares the same bottlenecks with the recent work.
>As you pointed out correctly, the question is about performance. You could see 
>the scalability bottleneck of 400 TPS in the evaluation section of this paper. 
>Our approach, however, provides snapshot isolation with a negligible overhead 
>on region servers, and could scale up to tens of thousands write transactions 
>per second. If you are interested, a summary of techniques that we used to 
>achieve this performance is published at SOSP'11, poster section.
>http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>
>
>>-- Lars
>>
>>
>>----- Original Message -----
>>From: Daniel Gómez Ferro <[email protected]>
>>To: "[email protected]" <[email protected]>; "[email protected]" 
>><[email protected]>
>>Cc: Maysam Yabandeh <[email protected]>; Flavio Junqueira 
>><[email protected]>; Benjamin Reed <[email protected]>; Ivan Kelly 
>><[email protected]>
>>Sent: Friday, November 4, 2011 4:24 AM
>>Subject: Omid: Transactional Support for HBase
>>
>>(I apologize for resending but I forgot to add the user list.)
>>
>>Hi all,
>>
>>It is my pleasure to announce the open source release of Omid, a project 
>>whose goal is to add lock-free transactional support on top of HBase. The 
>>current release includes CrSO, a client-replicated status oracle that detects 
>>the write-write conflicts to provide Snapshot Isolation. CrSO has the 
>>following appealing properties:
>>
>>1) It does not need any modification into the HBase code nor the table scheme.
>>2) The overhead on HBase DataNodes is negligible (only after an abort)
>>3) It scales up to 50,000 write transactions per second (TPS) and a thousand 
>>of client connections.
>>
>>We have setup a github project: https://github.com/dgomezferro/omid
>>
>>More information is available at the wiki: 
>>https://github.com/dgomezferro/omid/wiki
>>
>>If you are interested, installation and running instructions are available on 
>>the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>
>>Please do not hesitate to contact us in the case of any question.
>>
>>Best Regards,
>>Daniel Gómez Ferro
>>
>>
>

flavio
junqueira

research scientist

[email protected]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301    

Reply via email to