Hi Kevin, > Just to clarify Andrew do you have a prototype patch up that could potentially be worked on to either move postSplit() or add new hooks into the framework/are planning on submitting it sometime in the near future?
No, I meant one of the patches I put up on HBASE-2000. Basically CP design is multiversioned my head and I skipped over the current version due to a bug. :-) Sorry for any confusion. Like Ram says in a subsequent email, we could add a new upcall for the PONR in the split transaction, preSplitPONR and postSplitPONR.... though the naming is not ideal perhaps. I opened https://issues.apache.org/jira/browse/HBASE-6696 On Tue, Aug 28, 2012 at 11:59 PM, Kevin Shin < [email protected]> wrote: > Hello again everyone, > > Thanks for responding! I really appreciate all of the advice that's been > given so far. :) > > Just to clarify Andrew do you have a prototype patch up that could > potentially be worked on to either move postSplit() or add new hooks into > the framework/are planning on submitting it sometime in the near future? > > I'd also love to get any feedback from the community about where to add > the hook(s) but my thought was that we should have different levels of > hooks within a split as Ramkrishna suggested. Perhaps two preSplits to > accomodate for grabbing as well as a postSplit and a completeSplit? Giving > a better abstraction would definitely help developers figure out how to > deal with asynchronous calls to split, Put, and Delete. Thanks as always! > > Best, > Kevin > > On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl <[email protected]>wrote: > >> That approach sounds good to me. >> >> >> >> ----- Original Message ----- >> From: Andrew Purtell <[email protected]> >> To: [email protected] >> Cc: >> Sent: Tuesday, August 28, 2012 3:05 AM >> Subject: Re: Improving Coprocessor postSplit/postOpen synchronization >> >> Never mind, I went to look at the code. Should have done that first. >> >> Looking at 0.94 sources, in SplitTransaction, first we notify the master >> that the split has happened, and wait for the master to process it (which >> opens daughters), and then call up to the CP with the daughter regions as >> arguments. >> >> I seem to remember that in my prototype patch for the CP framework, >> postSplit notification let the CP know the split took place and allow it >> to >> take actions before the master opened the daughters. In any event that's >> not the code now, so it seems what you need here is for us to move the >> postSplit upcall up prior to master notification or add another hook at >> that location. >> >> On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[email protected] >> >wrote: >> >> > (from postSplit) >> > >> > >> > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell <[email protected] >> >wrote: >> > >> >> What about writing a marker (a file) into the region at split (from >> >> preSplit) which is then existence checked and read at open (postOpen)? >> This >> >> file would contain whatever indexing metadata is required. >> >> >> >> Also, splits are nearly instant because the daughters are created with >> >> reference files to the parent, until a later compaction brings the data >> >> from the parent over. Can you do the same with your indexes? Reason I >> ask >> >> is this notion of "ignoring" new data until indexes are available seems >> >> undesirable. >> >> >> >> >> >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < >> >> [email protected]> wrote: >> >> >> >>> Hi everyone, >> >>> >> >>> A colleague and I were working with HBase coprocessors for secondary >> >>> indexes and ran into an interesting problem regarding splits >> >>> and synchronizing the corresponding parent/daughter regions. >> >>> >> >>> The goal with splits is to create two new daughter regions with the >> >>> corresponding splits of the secondary indexes and lock these regions >> such >> >>> that Puts/Deletes that occur while postSplit is in progress will be >> >>> queued >> >>> up so we don't run into consistency issues. IE, if a delete gets >> called >> >>> before a daughter region receives the split index, that delete would >> >>> essentially have been ignored, so we would want to wait until >> postSplit >> >>> is >> >>> finished before running any new Puts/Deletes on the split regions. >> >>> >> >>> As of right now, the HBase coprocessors do not easily support a way to >> >>> achieve this level of consistency in that there is no way to >> distinguish >> >>> a >> >>> region being opened from a split or a regular open. If we could >> >>> distinguish, we could open up the correct index from the start and >> stall >> >>> until postSplit is finished in the background in the event of a >> split. I >> >>> would thus like to propose a way to "lock" the daughter regions when >> >>> postSplit is called. That is, when we open a daughter region from a >> >>> split, >> >>> we can pass in the parent region name alongside it (or Null if there >> is >> >>> no >> >>> parent) to distinguish a region being opened from a split or open. I >> am >> >>> thinking about submitting a patch into JIRA but would greatly >> appreciate >> >>> any thoughts or suggestions for another solution to the problem or >> >>> perhaps >> >>> a better patch. I am using HBase 0.92 for development at this moment. >> >>> >> >>> Best, >> >>> Kevin >> >
