Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Duo Zhang Tue, 17 Oct 2017 18:57:20 -0700

Fine. Let me change the title of HBASE-18770 and prepare a patch there.

May still a week or two before alpha4 I think. The scan injection, and
flush/compaction trigger/track API is still unstable...


2017-10-18 6:12 GMT+08:00 Josh Elser <[email protected]>:

> (catching up here)
>
> I'm glad to see you fine folks came to a conclusion around a reduced-scope
> solution (correct me if I'm wrong). "Some" bypass mechanism would stay for
> preXXX methods, and we'd remove it for the other methods? What exactly the
> "bypass API" would be is up in the air, correct?
>
> Duo -- maybe you could put the "current plan" on HBASE-18770 since
> discussion appears to have died down?
>
> I was originally lamenting yet another big, sweeping change to CPs when I
> had expected alpha-4 to have already landed. But, let me play devil's
> advocate: is this something we still think is critical to do in alpha-4? I
> can respect wanting to get address all of these smells, but I'd be worry it
> delays us further.
>
>
> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>
>> Creating an exception is expensive so if it is not suggested to do it in a
>> normal case. A common trick is to create a global exception instance, and
>> always throw it to avoid creating every time but I think it is more
>> friendly to just use a return value?
>>
>> And for me, the bypass after preXXX for normal region operations just
>> equals to a 'cancel', which is very clear and easy to understand, so I
>> think it is OK to add bypass support for them. And also for compaction and
>> flush, it is OK to give CP users the ability to cancel the operation as
>> the
>> semantic is clear, although I'm not sure how CP users would use this
>> feature.
>>
>> In general, I think we can provide bypass/cancel support in preXXX methods
>> where it is the very beginning of an operation.
>>
>> Thanks.
>>
>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <[email protected]>:
>>
>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>>>>
>>> its long encoding writing Increments. Not sure how we'd do that,
>>> selectively.
>>>
>>> If we can handle the rest of the trouble that you observed:
>>>
>>> 1) Lack of recognition and identification of when the key value to
>>> increment doesn't exist
>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>
>>> then they might be able to make it work. Perhaps a conversion from HBase
>>> native to Phoenix LONG encoding when processing results, in the wrapping
>>> scanner, informed by schema metadata.
>>>
>>> Or if we are keeping the bypass semantic in select places but
>>> implementing
>>> it with something other than today's bypass() API (please) this would be
>>> another candidate for where to keep it. Duo suggests keeping the semantic
>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> redo
>>> those APIs to skip normal processing based on a return value or exception
>>> but otherwise drop bypass from all the others. It will clean up areas of
>>> confusion, e.g. can I bypass splits or flushes or not? Or what about this
>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>> would be: only RPC hooks will early out, and only if you return this
>>> value,
>>> or throw that exception.
>>>
>>>
>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <[email protected]> wrote:
>>>
>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>
>>> result.
>>>
>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>> internally.
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <[email protected]>
>>>> wrote:
>>>>
>>>> Rather than continue to support a weird bypass() which works in some
>>>>>
>>>> places
>>>>
>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>
>>>> if
>>>
>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>> allowed we catch it and do the right thing, and where it is not allowed
>>>>>
>>>> we
>>>>
>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>
>>>> silent
>>>>
>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>
>>>> calling
>>>
>>>> this environment method has magic side effects. Everyone understands
>>>>>
>>>> how
>>>
>>>> exceptions work, so it will be clearer.
>>>>>
>>>>>
>>>>> We could do that though throw and catch of exceptions would be costly.
>>>>
>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>> preXXX
>>>> in a few select methods returning a boolean on whether bypass? Would
>>>> that
>>>> work? (Would have to figure metrics still).
>>>>
>>>>
>>>>
>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>
>>>> brought
>>>
>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>>> file JIRAs to follow up?
>>>>>
>>>>>
>>>>>
>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>> use
>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>> selectively.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <[email protected]>
>>>>> wrote:
>>>>>
>>>>> These examples are great.
>>>>>>
>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>
>>>>> preXXX
>>>>>
>>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>>>>
>>>>> just
>>>>>
>>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>>> framework that we have already done the real operation so just give
>>>>>>
>>>>> up
>>>
>>>> and
>>>>>
>>>>>> return?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <[email protected]>:
>>>>>>
>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>
>>>>>> preDelete()
>>>>
>>>>> to
>>>>>>
>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>
>>>>>> hbase/coprocessor/
>>>>>>
>>>>>>> TransactionProcessor.java#L244
>>>>>>>
>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>
>>>>>> preGetOp()
>>>
>>>> and
>>>>>
>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>
>>>>>> of
>>>>
>>>>> readless increments:
>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>
>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>
>>>>>> both
>>>
>>>> cases
>>>>>>
>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <[email protected]
>>>>>>>
>>>>>>
>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>
>>>>>>> abt
>>>
>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>
>>>>>>> creation
>>>
>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>
>>>>>>> can
>>>>
>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>
>>>>>>> one
>>>>
>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>>
>>>>>>> done
>>>>
>>>>> in post also in such cases I believe.
>>>>>>>>
>>>>>>>> -Anoop-
>>>>>>>>
>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>
>>>>>>> [email protected]>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>
>>>>>>>> object
>>>>
>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>
>>>>>>>> than
>>>
>>>> removing
>>>>>>>>
>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>
>>>>>>>> depends
>>>
>>>> on
>>>>>
>>>>>> being
>>>>>>>>
>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>
>>>>>>>> [email protected]>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>
>>>>>>>>> only
>>>
>>>> by
>>>>
>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>
>>>>>>>>> the
>>>>
>>>>> pre
>>>>>>
>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>>
>>>>>>>>> we
>>>
>>>> will
>>>>>
>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>
>>>>>>>>> remove
>>>>
>>>>> any
>>>>>>
>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>>
>>>>>>>>> execution
>>>>>>
>>>>>>> ?   Am +1.
>>>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>
>>>>>>>>> [email protected]
>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>
>>>>>>>>>> bypass(),
>>>
>>>> that
>>>>>
>>>>>> when
>>>>>>>>
>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>
>>>>>>>>>> all
>>>
>>>> remaining
>>>>>>>>
>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>
>>>>>>>>>> Since
>>>>
>>>>> this
>>>>>>
>>>>>>> time I
>>>>>>>>>>
>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>
>>>>>>>>>> this
>>>
>>>> feature.
>>>>>>>
>>>>>>>> (Or,
>>>>>>>>>>
>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>
>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>
>>>>>>>>>> case
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>>
>>>>>>>>>> call
>>>>
>>>>> bypass()
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>
>>>>>>>>>> lead
>>>
>>>> to a
>>>>>
>>>>>> poor
>>>>>>>>
>>>>>>>>> developer experience.
>>>>>>>>>>>
>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> core
>>>>>>
>>>>>>> code
>>>>>>>>
>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>
>>>>>>>>>> understand
>>>>>>
>>>>>>> what
>>>>>>>>
>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>
>>>>>>>>>> read
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>
>>>>>>>>>> Although I
>>>>
>>>>> think
>>>>>>>
>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>
>>>>>>>>>> demands a
>>>>>>
>>>>>>> lot. I
>>>>>>>>>>
>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>
>>>>>>>>>> we
>>>>
>>>>> didn't
>>>>>>>
>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>>
>>>>>>>>>> coprocessor
>>>
>>>> would
>>>>>>
>>>>>>> be a
>>>>>>>>
>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>
>>>>>>>>>> is
>>>
>>>> extremely
>>>>>>>>
>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>
>>>>>>>>>> significantly,
>>>>>>
>>>>>>> even
>>>>>>>>
>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>
>>>>>>>>>> consistent
>>>>>>
>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>
>>>>>>>>>> semantic.
>>>
>>>> To
>>>>
>>>>> achieve
>>>>>>>>
>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>
>>>>>>>>>> Therefore
>>>>
>>>>> the
>>>>>>
>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>>>
>>>>>>>>>> practice
>>>>>>
>>>>>>> as
>>>>>>>
>>>>>>>> soon
>>>>>>>>>>
>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> assumption
>>>>>>>>
>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>
>>>>>>>>>> necessary
>>>>>>>
>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>>
>>>>>>>>>> point,
>>>>>
>>>>>> such an
>>>>>>>>
>>>>>>>>> assumption is never safe.
>>>>>>>>>>>
>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>
>>>>>>>>>> relying
>>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>
>>>>>>>>>> might
>>>>
>>>>> use
>>>>>>
>>>>>>> it
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>
>>>>>>>>>> operation,
>>>>
>>>>> by
>>>>>>
>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>
>>>>>>>>>> could
>>>
>>>> be
>>>>>
>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>
>>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Reply via email to