I find all of these ideas interesting but a little bit scope-creepy. It used to be that regions were an implementation detail, but with these new APIs it'd be very much an application-level construct. We should think carefully before adding new APIs to do this - perhaps we can start playing with the idea on a branch and see if there are some really compelling applications?
-Todd On Wed, Jan 18, 2012 at 7:03 PM, lars hofhansl <[email protected]> wrote: > Was thinking about that as well. That would be doable. > > Would still need to be some sort of distributed transaction (in the sense > there would be a prepare/vote and commit > phase between the participating regions),but it would all be local to a > single server. > > > > ________________________________ > From: Ted Yu <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Wednesday, January 18, 2012 6:51 PM > Subject: Re: Limited cross row transactions > > Still need to go over the patch, Lars. > > I wonder how difficult supporting cross-region transactions in the same > region server would be. > > Cheers > > On Wed, Jan 18, 2012 at 5:02 PM, lars hofhansl <[email protected]> wrote: > >> Filed https://issues.apache.org/jira/browse/HBASE-5229 for further >> discussion, attached a patch that does this. >> >> >> As for your point... >> The below is one way to define limited groups of rows that can participate >> in transactions (I should not have named it parent/child, that just >> confuses my point). >> Your scenario calls for global transaction (unless you have to some other >> approach to limit the scope of rows that could participate in your FK >> transactions to something less than the entire database). >> >> If every transaction is a global transaction the database will not scale. >> >> See http://www.julianbrowne.com/article/viewer/brewers-cap-theorem >> and >> http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/ >> >> Also check out two phase commit failure and blocking scenarios, and Paxos' >> conditions for termination. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Mikael Sitruk <[email protected]> >> To: [email protected]; lars hofhansl <[email protected]> >> Cc: >> Sent: Wednesday, January 18, 2012 12:01 AM >> Subject: Re: Limited cross row transactions >> >> This is for parent child relationship, but what if there is no parent child >> relationship, but more a foreign key like relationship? >> Using this model you do a full scan to get all the index (since you don't >> know the parent, you just know the "secondary index"). >> Or will you use a group ID as a prefix of parent key and "child" key? In >> this case splitting according to group may be more difficult, (due to >> different growth of groups). >> Doing this aren't we back in the headache of sharding in rdbms? >> >> Mikael.S >> >> >> On Wed, Jan 18, 2012 at 7:45 AM, lars hofhansl <[email protected]> >> wrote: >> >> > This thread is probably getting too long... >> > >> > In HBase we have to let go of "global stuff". I submit that global >> > transactions across 1000's of nodes that can fail will never work >> > adequately. >> > For that kind of consistency you will be hit in availability. >> > >> > Like Megastore the trick is in creating a local grouping of entities that >> > can participate in local transactions. >> > If you limit the (consistent) index to child entities of parent entity >> you >> > can form your index like this: >> > parentKey1... >> > parentKey1.childTableName1.indexedField1 >> > parentKey1.childTableName1.indexedField2 >> > ... >> > parentKey1.childTableName2.indexedField1 >> > parentKey1.childTableName2.indexedField2 >> > ... >> > (assuming . cannot be in any parent key or child table name here, but you >> > get the idea). >> > >> > >> > When scanning the parent you'd have to skip the index rows with a filter. >> > Within a parentKey you can find childKeys efficiently by scanning the >> > index rows. >> > >> > Since the parent and the index entries would sort together the table can >> > be pre-split (or one could have a simple prefix based balancer). >> > >> > -- Lars >> > >> > ----- Original Message ----- >> > From: Mikael Sitruk <[email protected]> >> > To: [email protected] >> > Cc: >> > Sent: Tuesday, January 17, 2012 3:07 PM >> > Subject: Re: Limited cross row transactions >> > >> > Well i understand the limitation now, asking to be in the same region is >> > really hard constraint. >> > Even if this is on the same RS this is not enough, because after a >> restart, >> > regions may be allocated differently and now part of the data may be in >> one >> > region under server A and the other part under server B. >> > >> > Well perhaps we need use case for better understanding, and perhaps >> finding >> > alternative. >> > >> > The first use case i was thinking of is as follow - >> > I need to insert data with different access criteria, but the data >> inserted >> > should be inserted in atomic way. >> > In RDBMS i would have two table, insert data in the first one with key#1 >> > and then in the second one with key #2 then commit. >> > In HBase i need to use different column family with key #1 (for >> atomicity) >> > then to manage a kind of secondary index to map key#2 to key #1 (perhaps >> > via co-processor) to have quick access to the data of key#2. >> > Having cross row trx, i would think of sing different keys under the same >> > table (and probably different cf too), without the need to have secondary >> > index, but again with the limitation it does not seems to be easily >> > feasible. >> > >> > Mik. >> > >> > On Wed, Jan 18, 2012 at 12:22 AM, Ted Yu <[email protected]> wrote: >> > >> > > People rely on RDBMS for the transaction support. >> > > >> > > Consider the following example: >> > > A highly de-normalized schema puts related users in the same region >> where >> > > this 'limited cross row transactions' works. >> > > After some time, the region has to be split (maybe due to good business >> > > condition). >> > > What should the HBase user do now ? >> > > >> > > Cheers >> > > >> > > On Tue, Jan 17, 2012 at 2:13 PM, Mikael Sitruk < >> [email protected] >> > > >wrote: >> > > >> > > > Ted - My 2 cents as a user. >> > > > The user should know what he is doing, this is like a 'delete' >> > operation, >> > > > this is less intuitive that the original delete in RDBMS, so the same >> > > will >> > > > be for this light transaction. >> > > > If the transaction fails because of cross region server then the >> design >> > > of >> > > > the user was wrong >> > > > if the transaction fails because of concurrent access, then he should >> > be >> > > > able to re-read and reprocess its request. >> > > > The only problem is how to make sure in advance that the different >> rows >> > > > will be in the same RS? >> > > > >> > > > Lars - is the limitation is at the region or at the region server? It >> > was >> > > > not so clear. >> > > > >> > > > Mikael.S >> > > > >> > > > On Tue, Jan 17, 2012 at 11:52 PM, Ted Yu <[email protected]> >> wrote: >> > > > >> > > > > Back to original proposal: >> > > > > If client side grouping reveals that the batch of operations cannot >> > be >> > > > > supported by 'limited cross row transactions', what should the user >> > do >> > > ? >> > > > > >> > > > > Cheers >> > > > > >> > > > > On Tue, Jan 17, 2012 at 1:49 PM, Ted Yu <[email protected]> >> wrote: >> > > > > >> > > > > > Whether Omid fits the bill is open to discussion. >> > > > > > >> > > > > > We should revisit HBASE-2315 and provide the support Flavio, et >> al >> > > > need. >> > > > > > >> > > > > > Cheers >> > > > > > >> > > > > > >> > > > > > On Tue, Jan 17, 2012 at 1:41 PM, Lars George < >> > [email protected] >> > > > > >wrote: >> > > > > > >> > > > > >> Hi Ted, >> > > > > >> >> > > > > >> Wouldn't Omid (https://github.com/yahoo/omid) help there? Or is >> > > that >> > > > > too >> > > > > >> broad? Just curious. >> > > > > >> >> > > > > >> Lars >> > > > > >> >> > > > > >> On Jan 17, 2012, at 4:36 PM, Ted Yu wrote: >> > > > > >> >> > > > > >> > Can we collect use case for 'limited cross row transactions' >> > > first ? >> > > > > >> > >> > > > > >> > I have been thinking about (unlimited) multi-row transaction >> > > support >> > > > > in >> > > > > >> > HBase. It may not be a one-man task. But we should definitely >> > > > > implement >> > > > > >> it >> > > > > >> > someday. >> > > > > >> > >> > > > > >> > Cheers >> > > > > >> > >> > > > > >> > On Tue, Jan 17, 2012 at 1:27 PM, lars hofhansl < >> > > [email protected] >> > > > > >> > > > > >> wrote: >> > > > > >> > >> > > > > >> >> I just committed HBASE-5203 (together with HBASE-3584 this >> > > > implements >> > > > > >> >> atomic row operations). >> > > > > >> >> Although a relatively small patch it lays the groundwork for >> > > > > >> heterogeneous >> > > > > >> >> operations in a single WALEdit. >> > > > > >> >> >> > > > > >> >> The interesting part is that even though the code enforced >> the >> > > > atomic >> > > > > >> >> operation to be a for single row, this is not required. >> > > > > >> >> It is enough if all involved KVs reside in the same region. >> > > > > >> >> >> > > > > >> >> I am not saying that we should add any high level concept to >> > > HBase >> > > > > >> (such >> > > > > >> >> as the EntityGroups of Megastore). >> > > > > >> >> >> > > > > >> >> But, with a slight addition to the API (allowing a grouping >> of >> > > > > multiple >> > > > > >> >> row operations) client applications have all the building >> > blocks >> > > to >> > > > > do >> > > > > >> >> limited cross row atomic operations. >> > > > > >> >> The client application would be responsible for either >> > correctly >> > > > > >> >> pre-splitting the table, or a custom balancer has to be >> > provided. >> > > > > >> >> >> > > > > >> >> The operation would fail if the regionserver determines that >> it >> > > > would >> > > > > >> need >> > > > > >> >> data from multiple region servers. >> > > > > >> >> >> > > > > >> >> I think this needs at least minimal support from HBase and >> > cannot >> > > > > >> >> (efficiently or without adding more moving parts) by a client >> > API >> > > > > only. >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> Comments? Is this worth pursuing? If so, I'll file a jira and >> > > > > provide a >> > > > > >> >> patch. >> > > > > >> >> >> > > > > >> >> Thanks. >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> -- Lars >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> > > > > >> >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Mikael.S >> > > > >> > > >> > >> > >> > >> > -- >> > Mikael.S >> > >> > >> >> >> -- >> Mikael.S >> >> -- Todd Lipcon Software Engineer, Cloudera
