Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Josh Elser Tue, 17 Oct 2017 15:13:17 -0700

(catching up here)

I'm glad to see you fine folks came to a conclusion around areduced-scope solution (correct me if I'm wrong). "Some" bypassmechanism would stay for preXXX methods, and we'd remove it for theother methods? What exactly the "bypass API" would be is up in the air,correct?

Duo -- maybe you could put the "current plan" on HBASE-18770 sincediscussion appears to have died down?

I was originally lamenting yet another big, sweeping change to CPs whenI had expected alpha-4 to have already landed. But, let me play devil'sadvocate: is this something we still think is critical to do in alpha-4?I can respect wanting to get address all of these smells, but I'd beworry it delays us further.


On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:

Creating an exception is expensive so if it is not suggested to do it in a
normal case. A common trick is to create a global exception instance, and
always throw it to avoid creating every time but I think it is more
friendly to just use a return value?

And for me, the bypass after preXXX for normal region operations just
equals to a 'cancel', which is very clear and easy to understand, so I
think it is OK to add bypass support for them. And also for compaction and
flush, it is OK to give CP users the ability to cancel the operation as the
semantic is clear, although I'm not sure how CP users would use this
feature.

In general, I think we can provide bypass/cancel support in preXXX methods
where it is the very beginning of an operation.

Thanks.

2017-10-12 3:10 GMT+08:00 Andrew Purtell <[email protected]>:

On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use

its long encoding writing Increments. Not sure how we'd do that,
selectively.

If we can handle the rest of the trouble that you observed:

1) Lack of recognition and identification of when the key value to
increment doesn't exist
2) Lack of the ability to set the timestamp of the updated key value.

then they might be able to make it work. Perhaps a conversion from HBase
native to Phoenix LONG encoding when processing results, in the wrapping
scanner, informed by schema metadata.

Or if we are keeping the bypass semantic in select places but implementing
it with something other than today's bypass() API (please) this would be
another candidate for where to keep it. Duo suggests keeping the semantic
in all of the basic RPC preXXX hooks for query and mutation. We could redo
those APIs to skip normal processing based on a return value or exception
but otherwise drop bypass from all the others. It will clean up areas of
confusion, e.g. can I bypass splits or flushes or not? Or what about this
arcane hook in compaction? Or [insert some deep hook here]? The answer
would be: only RPC hooks will early out, and only if you return this value,
or throw that exception.


On Wed, Oct 11, 2017 at 11:56 AM, Stack <[email protected]> wrote:

The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
user does a Get returning instead the result of its own (Flow) Scan

result.

Not sure how we'd do alternative here; Timeline Server is keeping Tags
internally.


On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <[email protected]>
wrote:

Rather than continue to support a weird bypass() which works in some

places

and not in others, perhaps we can substitute it with an exception? So

if

the coprocessor throws this exception in the pre hook then where it is
allowed we catch it and do the right thing, and where it is not allowed

we

don't catch it and the server aborts. This will at least improve the

silent

bypass() failure problem. I also don't like, in retrospect, that

calling

this environment method has magic side effects. Everyone understands

how

exceptions work, so it will be clearer.

We could do that though throw and catch of exceptions would be costly.

What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
in a few select methods returning a boolean on whether bypass? Would that
work? (Would have to figure metrics still).

In any case we should try to address the Tephra and Phoenix cases

brought

up in this discussion. They look like we can find alternatives. Shall I
file JIRAs to follow up?

On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
its long encoding writing Increments. Not sure how we'd do that,
selectively.

St.Ack

On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <[email protected]>
wrote:

These examples are great.

And I think for normal region operations such as get, put, delete,
checkAndXXX, increment, it is OK to bypass the real operation after

preXXX

as the semantic is clear enough. Instead of calling env.bypass, maybe

just

let these preXXX methods return a boolean is enough to tell the HBase
framework that we have already done the real operation so just give

up

and

return?

Thanks.

2017-10-11 3:19 GMT+08:00 Gary Helmling <[email protected]>:

The Tephra TransactionProcessor CP makes use of bypass() in

preDelete()

to

override handling of delete tombstones in a transactional way:
https://github.com/apache/incubator-tephra/blob/master/
tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/

hbase/coprocessor/

TransactionProcessor.java#L244

The CDAP IncrementHandler CP also makes use of bypass() in

preGetOp()

and

preIncrementAfterRRowLock() to provide a transaction implementation

of

readless increments:
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
compat-1.1/src/main/java/co/cask/cdap/data2/increment/
hbase11/IncrementHandler.java#L121

What would be the alternate approach for these applications?  In

both

cases

they need to impose their own semantics on the underlying KeyValue
storage.  Is there a different way this can be done?


On Tue, Oct 10, 2017 at 11:58 AM Anoop John <[email protected]

wrote:

Wrap core scanners is different right?  That can be done in post
hooks.  I have seen many use cases for this..  Its the question

abt

the pre hooks where we have not yet created the core object (like
scanner).  The CP pre code itself doing the work of object

creation

and so the core code is been bypassed.    Well the wrapping thing

can

be done in pre hook also. First create the core object by CP code
itself and then do the wrapped object and return.. I have seen in

one

jira issue where the usage was this way..   The wrapping can be

done

in post also in such cases I believe.

-Anoop-

On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <

[email protected]>

wrote:

I think we should continue to support overriding function by

object

inheritance. I didn't mention this and am not proposing more

than

removing

the bypass() sematic. No more no less. Phoenix absolutely

depends

on

being

able to wrap core scanners and return the wrappers.


On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <

[email protected]>

wrote:

When we say bypass the core code, it can be done today not

only

by

calling bypass but by returning a not null object for some of

the

pre

hooks.  Like preScannerOpen() if it return a scanner object,

we

will

avoid the remaining core code execution for creation of the
scanner(s).  So this proposal include this aspect also and

remove

any

possible way of bypassing the core code by the CP hook code

execution

?   Am +1.

-Anoop-

On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <

[email protected]

wrote:

The coprocessor API provides an environment method,

bypass(),

that

when

called from a preXXX hook will cause the core code to skip

all

remaining

processing. This capability was introduced on HBASE-3348.

Since

this

time I

think we are more enlightened about the complications of

this

feature.

(Or,

anyway, speaking for myself:)

Not all hooks provide the bypass semantic. Where this is the

case

the

javadoc for the hook says so, but it can be missed. If you

call

bypass()

in

a hook where it is not supported it is a no-op. This can

lead

to a

poor

developer experience.

Where bypass is supported what is being bypassed is all of

the

core

code

implementing the remainder of the operation. In order to

understand

what

calling bypass() will skip, a coprocessor implementer should

read

and

understand all of the remaining code and its nuances.

Although I

think

this

is good practice for coprocessor developers in general, it

demands a

lot. I

think it would provide a much better developer experience if

we

didn't

allow bypass, even though it means - in theory - a

coprocessor

would

be a

lot more limited in some ways than before. What is skipped

is

extremely

version dependent. That core code will vary, perhaps

significantly,

even

between point releases. We do not provide the promise of

consistent

behavior even between point releases for the bypass

semantic.

To

achieve

that we could not change any code between hook points.

Therefore

the

coprocessor implementer becomes an HBase core developer in

practice

as

soon

as they rely on bypass(). Every release of HBase may break

the

assumption

that the replacement for the bypassed code takes care of all

necessary

skipped concerns. Because those concerns can change at any

point,

such an

assumption is never safe.

I say "in theory" because I would be surprised if anyone is

relying

on

the

bypass for the above reason. I seem to recall that Phoenix

might

use

it

in

one place to promote a normal mutation into an atomic

operation,

by

substituting one for the other, but if so that objective

could

be

reimplemented using their new locking manager.

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Reply via email to