Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Anoop John Tue, 24 Oct 2017 21:56:43 -0700

+1 for not allowing flush/compaction CPs to do bypass.  So which all
pre hooks to support bypass now?  Some mutations related only?  If u
have a list pls post.  Better to add in HBASE-18770 jira


-Anoop-

On Wed, Oct 25, 2017 at 10:23 AM, Stack <st...@duboce.net> wrote:
> I made a start on HBASE-18770. It has edit of RegionObserver which denotes
> methods that support bypass (Unfortunately, because of the varied
> signatures, how bypass is signaled varies too). Would appreciate a
> once-over.
>
> of note, a CP cannot bypass flush. Speak up if you think otherwise (or you
> can think of a case where this needed). My rationale is CPs won't have
> enough insider knowledge to do memory accounting in a world of in-memory
> compactions, and on/offheap memory in our hosting process. What ye reckon?
>
> Coprocessors have always been able to adjust what gets compacted in any run
> and even skirt compaction altogether by returning an empty set of files to
> compact. This works as it ever did.
>
> Thanks,
> S
>
>
>
> On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
>
>> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
>> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
>> was going bad for a good while but lots of contributors and good discussion
>> and now I think we have it). Shouldn't be too much longer.
>>
>> Its CP API so I was figuring it an alpha-4 item.
>>
>> St.Ack
>>
>> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <palomino...@gmail.com>
>> wrote:
>>
>>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>>
>>> May still a week or two before alpha4 I think. The scan injection, and
>>> flush/compaction trigger/track API is still unstable...
>>>
>>> 2017-10-18 6:12 GMT+08:00 Josh Elser <els...@apache.org>:
>>>
>>> > (catching up here)
>>> >
>>> > I'm glad to see you fine folks came to a conclusion around a
>>> reduced-scope
>>> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>>> for
>>> > preXXX methods, and we'd remove it for the other methods? What exactly
>>> the
>>> > "bypass API" would be is up in the air, correct?
>>> >
>>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> > discussion appears to have died down?
>>> >
>>> > I was originally lamenting yet another big, sweeping change to CPs when
>>> I
>>> > had expected alpha-4 to have already landed. But, let me play devil's
>>> > advocate: is this something we still think is critical to do in
>>> alpha-4? I
>>> > can respect wanting to get address all of these smells, but I'd be
>>> worry it
>>> > delays us further.
>>> >
>>> >
>>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>> >
>>> >> Creating an exception is expensive so if it is not suggested to do it
>>> in a
>>> >> normal case. A common trick is to create a global exception instance,
>>> and
>>> >> always throw it to avoid creating every time but I think it is more
>>> >> friendly to just use a return value?
>>> >>
>>> >> And for me, the bypass after preXXX for normal region operations just
>>> >> equals to a 'cancel', which is very clear and easy to understand, so I
>>> >> think it is OK to add bypass support for them. And also for compaction
>>> and
>>> >> flush, it is OK to give CP users the ability to cancel the operation as
>>> >> the
>>> >> semantic is clear, although I'm not sure how CP users would use this
>>> >> feature.
>>> >>
>>> >> In general, I think we can provide bypass/cancel support in preXXX
>>> methods
>>> >> where it is the very beginning of an operation.
>>> >>
>>> >> Thanks.
>>> >>
>>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <apurt...@apache.org>:
>>> >>
>>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>> use
>>> >>>>
>>> >>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>> selectively.
>>> >>>
>>> >>> If we can handle the rest of the trouble that you observed:
>>> >>>
>>> >>> 1) Lack of recognition and identification of when the key value to
>>> >>> increment doesn't exist
>>> >>> 2) Lack of the ability to set the timestamp of the updated key value.
>>> >>>
>>> >>> then they might be able to make it work. Perhaps a conversion from
>>> HBase
>>> >>> native to Phoenix LONG encoding when processing results, in the
>>> wrapping
>>> >>> scanner, informed by schema metadata.
>>> >>>
>>> >>> Or if we are keeping the bypass semantic in select places but
>>> >>> implementing
>>> >>> it with something other than today's bypass() API (please) this would
>>> be
>>> >>> another candidate for where to keep it. Duo suggests keeping the
>>> semantic
>>> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> >>> redo
>>> >>> those APIs to skip normal processing based on a return value or
>>> exception
>>> >>> but otherwise drop bypass from all the others. It will clean up areas
>>> of
>>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>>> this
>>> >>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>> >>> would be: only RPC hooks will early out, and only if you return this
>>> >>> value,
>>> >>> or throw that exception.
>>> >>>
>>> >>>
>>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>> >>>
>>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>>> when
>>> >>>> user does a Get returning instead the result of its own (Flow) Scan
>>> >>>>
>>> >>> result.
>>> >>>
>>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping
>>> Tags
>>> >>>> internally.
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
>>> apurt...@apache.org>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Rather than continue to support a weird bypass() which works in some
>>> >>>>>
>>> >>>> places
>>> >>>>
>>> >>>>> and not in others, perhaps we can substitute it with an exception?
>>> So
>>> >>>>>
>>> >>>> if
>>> >>>
>>> >>>> the coprocessor throws this exception in the pre hook then where it
>>> is
>>> >>>>> allowed we catch it and do the right thing, and where it is not
>>> allowed
>>> >>>>>
>>> >>>> we
>>> >>>>
>>> >>>>> don't catch it and the server aborts. This will at least improve the
>>> >>>>>
>>> >>>> silent
>>> >>>>
>>> >>>>> bypass() failure problem. I also don't like, in retrospect, that
>>> >>>>>
>>> >>>> calling
>>> >>>
>>> >>>> this environment method has magic side effects. Everyone understands
>>> >>>>>
>>> >>>> how
>>> >>>
>>> >>>> exceptions work, so it will be clearer.
>>> >>>>>
>>> >>>>>
>>> >>>>> We could do that though throw and catch of exceptions would be
>>> costly.
>>> >>>>
>>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>> >>>> preXXX
>>> >>>> in a few select methods returning a boolean on whether bypass? Would
>>> >>>> that
>>> >>>> work? (Would have to figure metrics still).
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> In any case we should try to address the Tephra and Phoenix cases
>>> >>>>>
>>> >>>> brought
>>> >>>
>>> >>>> up in this discussion. They look like we can find alternatives.
>>> Shall I
>>> >>>>> file JIRAs to follow up?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
>>> to
>>> >>>> use
>>> >>>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>>> selectively.
>>> >>>>
>>> >>>> St.Ack
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
>>> palomino...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> These examples are great.
>>> >>>>>>
>>> >>>>>> And I think for normal region operations such as get, put, delete,
>>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>> >>>>>>
>>> >>>>> preXXX
>>> >>>>>
>>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>>> maybe
>>> >>>>>>
>>> >>>>> just
>>> >>>>>
>>> >>>>>> let these preXXX methods return a boolean is enough to tell the
>>> HBase
>>> >>>>>> framework that we have already done the real operation so just give
>>> >>>>>>
>>> >>>>> up
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> return?
>>> >>>>>>
>>> >>>>>> Thanks.
>>> >>>>>>
>>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <ghelml...@gmail.com>:
>>> >>>>>>
>>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preDelete()
>>> >>>>
>>> >>>>> to
>>> >>>>>>
>>> >>>>>>> override handling of delete tombstones in a transactional way:
>>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>> >>>>>>>
>>> >>>>>> hbase/coprocessor/
>>> >>>>>>
>>> >>>>>>> TransactionProcessor.java#L244
>>> >>>>>>>
>>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preGetOp()
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>> >>>>>>>
>>> >>>>>> of
>>> >>>>
>>> >>>>> readless increments:
>>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>> >>>>>>> hbase11/IncrementHandler.java#L121
>>> >>>>>>>
>>> >>>>>>> What would be the alternate approach for these applications?  In
>>> >>>>>>>
>>> >>>>>> both
>>> >>>
>>> >>>> cases
>>> >>>>>>
>>> >>>>>>> they need to impose their own semantics on the underlying KeyValue
>>> >>>>>>> storage.  Is there a different way this can be done?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
>>> anoop.hb...@gmail.com
>>> >>>>>>>
>>> >>>>>>
>>> >>>> wrote:
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Wrap core scanners is different right?  That can be done in post
>>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>> >>>>>>>>
>>> >>>>>>> abt
>>> >>>
>>> >>>> the pre hooks where we have not yet created the core object (like
>>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
>>> >>>>>>>>
>>> >>>>>>> creation
>>> >>>
>>> >>>> and so the core code is been bypassed.    Well the wrapping thing
>>> >>>>>>>>
>>> >>>>>>> can
>>> >>>>
>>> >>>>> be done in pre hook also. First create the core object by CP code
>>> >>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>> >>>>>>>>
>>> >>>>>>> one
>>> >>>>
>>> >>>>> jira issue where the usage was this way..   The wrapping can be
>>> >>>>>>>>
>>> >>>>>>> done
>>> >>>>
>>> >>>>> in post also in such cases I believe.
>>> >>>>>>>>
>>> >>>>>>>> -Anoop-
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>> >>>>>>>>
>>> >>>>>>> apurt...@apache.org>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I think we should continue to support overriding function by
>>> >>>>>>>>>
>>> >>>>>>>> object
>>> >>>>
>>> >>>>> inheritance. I didn't mention this and am not proposing more
>>> >>>>>>>>>
>>> >>>>>>>> than
>>> >>>
>>> >>>> removing
>>> >>>>>>>>
>>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>> >>>>>>>>>
>>> >>>>>>>> depends
>>> >>>
>>> >>>> on
>>> >>>>>
>>> >>>>>> being
>>> >>>>>>>>
>>> >>>>>>>>> able to wrap core scanners and return the wrappers.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>> >>>>>>>>>
>>> >>>>>>>> anoop.hb...@gmail.com>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> When we say bypass the core code, it can be done today not
>>> >>>>>>>>>>
>>> >>>>>>>>> only
>>> >>>
>>> >>>> by
>>> >>>>
>>> >>>>> calling bypass but by returning a not null object for some of
>>> >>>>>>>>>>
>>> >>>>>>>>> the
>>> >>>>
>>> >>>>> pre
>>> >>>>>>
>>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>> >>>>>>>>>>
>>> >>>>>>>>> we
>>> >>>
>>> >>>> will
>>> >>>>>
>>> >>>>>> avoid the remaining core code execution for creation of the
>>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>> >>>>>>>>>>
>>> >>>>>>>>> remove
>>> >>>>
>>> >>>>> any
>>> >>>>>>
>>> >>>>>>> possible way of bypassing the core code by the CP hook code
>>> >>>>>>>>>>
>>> >>>>>>>>> execution
>>> >>>>>>
>>> >>>>>>> ?   Am +1.
>>> >>>>>>>>>>
>>> >>>>>>>>>> -Anoop-
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>> >>>>>>>>>>
>>> >>>>>>>>> apurt...@apache.org
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> The coprocessor API provides an environment method,
>>> >>>>>>>>>>>
>>> >>>>>>>>>> bypass(),
>>> >>>
>>> >>>> that
>>> >>>>>
>>> >>>>>> when
>>> >>>>>>>>
>>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
>>> >>>>>>>>>>>
>>> >>>>>>>>>> all
>>> >>>
>>> >>>> remaining
>>> >>>>>>>>
>>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Since
>>> >>>>
>>> >>>>> this
>>> >>>>>>
>>> >>>>>>> time I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think we are more enlightened about the complications of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> this
>>> >>>
>>> >>>> feature.
>>> >>>>>>>
>>> >>>>>>>> (Or,
>>> >>>>>>>>>>
>>> >>>>>>>>>>> anyway, speaking for myself:)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>> >>>>>>>>>>>
>>> >>>>>>>>>> case
>>> >>>>>
>>> >>>>>> the
>>> >>>>>>>
>>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>> >>>>>>>>>>>
>>> >>>>>>>>>> call
>>> >>>>
>>> >>>>> bypass()
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>> >>>>>>>>>>>
>>> >>>>>>>>>> lead
>>> >>>
>>> >>>> to a
>>> >>>>>
>>> >>>>>> poor
>>> >>>>>>>>
>>> >>>>>>>>> developer experience.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> core
>>> >>>>>>
>>> >>>>>>> code
>>> >>>>>>>>
>>> >>>>>>>>> implementing the remainder of the operation. In order to
>>> >>>>>>>>>>>
>>> >>>>>>>>>> understand
>>> >>>>>>
>>> >>>>>>> what
>>> >>>>>>>>
>>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>> >>>>>>>>>>>
>>> >>>>>>>>>> read
>>> >>>>>
>>> >>>>>> and
>>> >>>>>>>
>>> >>>>>>>> understand all of the remaining code and its nuances.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Although I
>>> >>>>
>>> >>>>> think
>>> >>>>>>>
>>> >>>>>>>> this
>>> >>>>>>>>>>
>>> >>>>>>>>>>> is good practice for coprocessor developers in general, it
>>> >>>>>>>>>>>
>>> >>>>>>>>>> demands a
>>> >>>>>>
>>> >>>>>>> lot. I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think it would provide a much better developer experience if
>>> >>>>>>>>>>>
>>> >>>>>>>>>> we
>>> >>>>
>>> >>>>> didn't
>>> >>>>>>>
>>> >>>>>>>> allow bypass, even though it means - in theory - a
>>> >>>>>>>>>>>
>>> >>>>>>>>>> coprocessor
>>> >>>
>>> >>>> would
>>> >>>>>>
>>> >>>>>>> be a
>>> >>>>>>>>
>>> >>>>>>>>> lot more limited in some ways than before. What is skipped
>>> >>>>>>>>>>>
>>> >>>>>>>>>> is
>>> >>>
>>> >>>> extremely
>>> >>>>>>>>
>>> >>>>>>>>> version dependent. That core code will vary, perhaps
>>> >>>>>>>>>>>
>>> >>>>>>>>>> significantly,
>>> >>>>>>
>>> >>>>>>> even
>>> >>>>>>>>
>>> >>>>>>>>> between point releases. We do not provide the promise of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> consistent
>>> >>>>>>
>>> >>>>>>> behavior even between point releases for the bypass
>>> >>>>>>>>>>>
>>> >>>>>>>>>> semantic.
>>> >>>
>>> >>>> To
>>> >>>>
>>> >>>>> achieve
>>> >>>>>>>>
>>> >>>>>>>>> that we could not change any code between hook points.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Therefore
>>> >>>>
>>> >>>>> the
>>> >>>>>>
>>> >>>>>>> coprocessor implementer becomes an HBase core developer in
>>> >>>>>>>>>>>
>>> >>>>>>>>>> practice
>>> >>>>>>
>>> >>>>>>> as
>>> >>>>>>>
>>> >>>>>>>> soon
>>> >>>>>>>>>>
>>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> assumption
>>> >>>>>>>>
>>> >>>>>>>>> that the replacement for the bypassed code takes care of all
>>> >>>>>>>>>>>
>>> >>>>>>>>>> necessary
>>> >>>>>>>
>>> >>>>>>>> skipped concerns. Because those concerns can change at any
>>> >>>>>>>>>>>
>>> >>>>>>>>>> point,
>>> >>>>>
>>> >>>>>> such an
>>> >>>>>>>>
>>> >>>>>>>>> assumption is never safe.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>> >>>>>>>>>>>
>>> >>>>>>>>>> relying
>>> >>>>>>
>>> >>>>>>> on
>>> >>>>>>>
>>> >>>>>>>> the
>>> >>>>>>>>>>
>>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>> >>>>>>>>>>>
>>> >>>>>>>>>> might
>>> >>>>
>>> >>>>> use
>>> >>>>>>
>>> >>>>>>> it
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
>>> >>>>>>>>>>>
>>> >>>>>>>>>> operation,
>>> >>>>
>>> >>>>> by
>>> >>>>>>
>>> >>>>>>> substituting one for the other, but if so that objective
>>> >>>>>>>>>>>
>>> >>>>>>>>>> could
>>> >>>
>>> >>>> be
>>> >>>>>
>>> >>>>>> reimplemented using their new locking manager.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>
>>> >>>
>>> >>
>>>
>>
>>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Reply via email to