Fine. Let me change the title of HBASE-18770 and prepare a patch there. May still a week or two before alpha4 I think. The scan injection, and flush/compaction trigger/track API is still unstable...
2017-10-18 6:12 GMT+08:00 Josh Elser <[email protected]>: > (catching up here) > > I'm glad to see you fine folks came to a conclusion around a reduced-scope > solution (correct me if I'm wrong). "Some" bypass mechanism would stay for > preXXX methods, and we'd remove it for the other methods? What exactly the > "bypass API" would be is up in the air, correct? > > Duo -- maybe you could put the "current plan" on HBASE-18770 since > discussion appears to have died down? > > I was originally lamenting yet another big, sweeping change to CPs when I > had expected alpha-4 to have already landed. But, let me play devil's > advocate: is this something we still think is critical to do in alpha-4? I > can respect wanting to get address all of these smells, but I'd be worry it > delays us further. > > > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote: > >> Creating an exception is expensive so if it is not suggested to do it in a >> normal case. A common trick is to create a global exception instance, and >> always throw it to avoid creating every time but I think it is more >> friendly to just use a return value? >> >> And for me, the bypass after preXXX for normal region operations just >> equals to a 'cancel', which is very clear and easy to understand, so I >> think it is OK to add bypass support for them. And also for compaction and >> flush, it is OK to give CP users the ability to cancel the operation as >> the >> semantic is clear, although I'm not sure how CP users would use this >> feature. >> >> In general, I think we can provide bypass/cancel support in preXXX methods >> where it is the very beginning of an operation. >> >> Thanks. >> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <[email protected]>: >> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use >>>> >>> its long encoding writing Increments. Not sure how we'd do that, >>> selectively. >>> >>> If we can handle the rest of the trouble that you observed: >>> >>> 1) Lack of recognition and identification of when the key value to >>> increment doesn't exist >>> 2) Lack of the ability to set the timestamp of the updated key value. >>> >>> then they might be able to make it work. Perhaps a conversion from HBase >>> native to Phoenix LONG encoding when processing results, in the wrapping >>> scanner, informed by schema metadata. >>> >>> Or if we are keeping the bypass semantic in select places but >>> implementing >>> it with something other than today's bypass() API (please) this would be >>> another candidate for where to keep it. Duo suggests keeping the semantic >>> in all of the basic RPC preXXX hooks for query and mutation. We could >>> redo >>> those APIs to skip normal processing based on a return value or exception >>> but otherwise drop bypass from all the others. It will clean up areas of >>> confusion, e.g. can I bypass splits or flushes or not? Or what about this >>> arcane hook in compaction? Or [insert some deep hook here]? The answer >>> would be: only RPC hooks will early out, and only if you return this >>> value, >>> or throw that exception. >>> >>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <[email protected]> wrote: >>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when >>>> user does a Get returning instead the result of its own (Flow) Scan >>>> >>> result. >>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags >>>> internally. >>>> >>>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <[email protected]> >>>> wrote: >>>> >>>> Rather than continue to support a weird bypass() which works in some >>>>> >>>> places >>>> >>>>> and not in others, perhaps we can substitute it with an exception? So >>>>> >>>> if >>> >>>> the coprocessor throws this exception in the pre hook then where it is >>>>> allowed we catch it and do the right thing, and where it is not allowed >>>>> >>>> we >>>> >>>>> don't catch it and the server aborts. This will at least improve the >>>>> >>>> silent >>>> >>>>> bypass() failure problem. I also don't like, in retrospect, that >>>>> >>>> calling >>> >>>> this environment method has magic side effects. Everyone understands >>>>> >>>> how >>> >>>> exceptions work, so it will be clearer. >>>>> >>>>> >>>>> We could do that though throw and catch of exceptions would be costly. >>>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/ >>>> preXXX >>>> in a few select methods returning a boolean on whether bypass? Would >>>> that >>>> work? (Would have to figure metrics still). >>>> >>>> >>>> >>>> In any case we should try to address the Tephra and Phoenix cases >>>>> >>>> brought >>> >>>> up in this discussion. They look like we can find alternatives. Shall I >>>>> file JIRAs to follow up? >>>>> >>>>> >>>>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to >>>> use >>>> its long encoding writing Increments. Not sure how we'd do that, >>>> selectively. >>>> >>>> St.Ack >>>> >>>> >>>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <[email protected]> >>>>> wrote: >>>>> >>>>> These examples are great. >>>>>> >>>>>> And I think for normal region operations such as get, put, delete, >>>>>> checkAndXXX, increment, it is OK to bypass the real operation after >>>>>> >>>>> preXXX >>>>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe >>>>>> >>>>> just >>>>> >>>>>> let these preXXX methods return a boolean is enough to tell the HBase >>>>>> framework that we have already done the real operation so just give >>>>>> >>>>> up >>> >>>> and >>>>> >>>>>> return? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <[email protected]>: >>>>>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in >>>>>>> >>>>>> preDelete() >>>> >>>>> to >>>>>> >>>>>>> override handling of delete tombstones in a transactional way: >>>>>>> https://github.com/apache/incubator-tephra/blob/master/ >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/ >>>>>>> >>>>>> hbase/coprocessor/ >>>>>> >>>>>>> TransactionProcessor.java#L244 >>>>>>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in >>>>>>> >>>>>> preGetOp() >>> >>>> and >>>>> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation >>>>>>> >>>>>> of >>>> >>>>> readless increments: >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase- >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/ >>>>>>> hbase11/IncrementHandler.java#L121 >>>>>>> >>>>>>> What would be the alternate approach for these applications? In >>>>>>> >>>>>> both >>> >>>> cases >>>>>> >>>>>>> they need to impose their own semantics on the underlying KeyValue >>>>>>> storage. Is there a different way this can be done? >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <[email protected] >>>>>>> >>>>>> >>>> wrote: >>>>>> >>>>>>> >>>>>>> Wrap core scanners is different right? That can be done in post >>>>>>>> hooks. I have seen many use cases for this.. Its the question >>>>>>>> >>>>>>> abt >>> >>>> the pre hooks where we have not yet created the core object (like >>>>>>>> scanner). The CP pre code itself doing the work of object >>>>>>>> >>>>>>> creation >>> >>>> and so the core code is been bypassed. Well the wrapping thing >>>>>>>> >>>>>>> can >>>> >>>>> be done in pre hook also. First create the core object by CP code >>>>>>>> itself and then do the wrapped object and return.. I have seen in >>>>>>>> >>>>>>> one >>>> >>>>> jira issue where the usage was this way.. The wrapping can be >>>>>>>> >>>>>>> done >>>> >>>>> in post also in such cases I believe. >>>>>>>> >>>>>>>> -Anoop- >>>>>>>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell < >>>>>>>> >>>>>>> [email protected]> >>>>> >>>>>> wrote: >>>>>>>> >>>>>>>>> I think we should continue to support overriding function by >>>>>>>>> >>>>>>>> object >>>> >>>>> inheritance. I didn't mention this and am not proposing more >>>>>>>>> >>>>>>>> than >>> >>>> removing >>>>>>>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely >>>>>>>>> >>>>>>>> depends >>> >>>> on >>>>> >>>>>> being >>>>>>>> >>>>>>>>> able to wrap core scanners and return the wrappers. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John < >>>>>>>>> >>>>>>>> [email protected]> >>>>> >>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> When we say bypass the core code, it can be done today not >>>>>>>>>> >>>>>>>>> only >>> >>>> by >>>> >>>>> calling bypass but by returning a not null object for some of >>>>>>>>>> >>>>>>>>> the >>>> >>>>> pre >>>>>> >>>>>>> hooks. Like preScannerOpen() if it return a scanner object, >>>>>>>>>> >>>>>>>>> we >>> >>>> will >>>>> >>>>>> avoid the remaining core code execution for creation of the >>>>>>>>>> scanner(s). So this proposal include this aspect also and >>>>>>>>>> >>>>>>>>> remove >>>> >>>>> any >>>>>> >>>>>>> possible way of bypassing the core code by the CP hook code >>>>>>>>>> >>>>>>>>> execution >>>>>> >>>>>>> ? Am +1. >>>>>>>>>> >>>>>>>>>> -Anoop- >>>>>>>>>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell < >>>>>>>>>> >>>>>>>>> [email protected] >>>>>> >>>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> The coprocessor API provides an environment method, >>>>>>>>>>> >>>>>>>>>> bypass(), >>> >>>> that >>>>> >>>>>> when >>>>>>>> >>>>>>>>> called from a preXXX hook will cause the core code to skip >>>>>>>>>>> >>>>>>>>>> all >>> >>>> remaining >>>>>>>> >>>>>>>>> processing. This capability was introduced on HBASE-3348. >>>>>>>>>>> >>>>>>>>>> Since >>>> >>>>> this >>>>>> >>>>>>> time I >>>>>>>>>> >>>>>>>>>>> think we are more enlightened about the complications of >>>>>>>>>>> >>>>>>>>>> this >>> >>>> feature. >>>>>>> >>>>>>>> (Or, >>>>>>>>>> >>>>>>>>>>> anyway, speaking for myself:) >>>>>>>>>>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the >>>>>>>>>>> >>>>>>>>>> case >>>>> >>>>>> the >>>>>>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you >>>>>>>>>>> >>>>>>>>>> call >>>> >>>>> bypass() >>>>>>>> >>>>>>>>> in >>>>>>>>>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can >>>>>>>>>>> >>>>>>>>>> lead >>> >>>> to a >>>>> >>>>>> poor >>>>>>>> >>>>>>>>> developer experience. >>>>>>>>>>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of >>>>>>>>>>> >>>>>>>>>> the >>> >>>> core >>>>>> >>>>>>> code >>>>>>>> >>>>>>>>> implementing the remainder of the operation. In order to >>>>>>>>>>> >>>>>>>>>> understand >>>>>> >>>>>>> what >>>>>>>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should >>>>>>>>>>> >>>>>>>>>> read >>>>> >>>>>> and >>>>>>> >>>>>>>> understand all of the remaining code and its nuances. >>>>>>>>>>> >>>>>>>>>> Although I >>>> >>>>> think >>>>>>> >>>>>>>> this >>>>>>>>>> >>>>>>>>>>> is good practice for coprocessor developers in general, it >>>>>>>>>>> >>>>>>>>>> demands a >>>>>> >>>>>>> lot. I >>>>>>>>>> >>>>>>>>>>> think it would provide a much better developer experience if >>>>>>>>>>> >>>>>>>>>> we >>>> >>>>> didn't >>>>>>> >>>>>>>> allow bypass, even though it means - in theory - a >>>>>>>>>>> >>>>>>>>>> coprocessor >>> >>>> would >>>>>> >>>>>>> be a >>>>>>>> >>>>>>>>> lot more limited in some ways than before. What is skipped >>>>>>>>>>> >>>>>>>>>> is >>> >>>> extremely >>>>>>>> >>>>>>>>> version dependent. That core code will vary, perhaps >>>>>>>>>>> >>>>>>>>>> significantly, >>>>>> >>>>>>> even >>>>>>>> >>>>>>>>> between point releases. We do not provide the promise of >>>>>>>>>>> >>>>>>>>>> consistent >>>>>> >>>>>>> behavior even between point releases for the bypass >>>>>>>>>>> >>>>>>>>>> semantic. >>> >>>> To >>>> >>>>> achieve >>>>>>>> >>>>>>>>> that we could not change any code between hook points. >>>>>>>>>>> >>>>>>>>>> Therefore >>>> >>>>> the >>>>>> >>>>>>> coprocessor implementer becomes an HBase core developer in >>>>>>>>>>> >>>>>>>>>> practice >>>>>> >>>>>>> as >>>>>>> >>>>>>>> soon >>>>>>>>>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break >>>>>>>>>>> >>>>>>>>>> the >>> >>>> assumption >>>>>>>> >>>>>>>>> that the replacement for the bypassed code takes care of all >>>>>>>>>>> >>>>>>>>>> necessary >>>>>>> >>>>>>>> skipped concerns. Because those concerns can change at any >>>>>>>>>>> >>>>>>>>>> point, >>>>> >>>>>> such an >>>>>>>> >>>>>>>>> assumption is never safe. >>>>>>>>>>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is >>>>>>>>>>> >>>>>>>>>> relying >>>>>> >>>>>>> on >>>>>>> >>>>>>>> the >>>>>>>>>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix >>>>>>>>>>> >>>>>>>>>> might >>>> >>>>> use >>>>>> >>>>>>> it >>>>>>>> >>>>>>>>> in >>>>>>>>>> >>>>>>>>>>> one place to promote a normal mutation into an atomic >>>>>>>>>>> >>>>>>>>>> operation, >>>> >>>>> by >>>>>> >>>>>>> substituting one for the other, but if so that objective >>>>>>>>>>> >>>>>>>>>> could >>> >>>> be >>>>> >>>>>> reimplemented using their new locking manager. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>> >>> >>
