Re: On coprocessor API evolution

Michael Segel Sun, 18 May 2014 11:05:28 -0700

And you should consult a lawyer before you make a statement like that… 

These are exposed APIs and Cloudera, Hortonworks, MapR, Pivotal, even Intel… if 
they still have licensed customers..   all have to support their releases.


BTW, I think I’m the only person who’s given a talk trying to explain the 
dangers of coprocessors and why they shouldn’t be used. ;-)


On May 18, 2014, at 3:09 AM, Andrew Purtell <apurt...@apache.org> wrote:

> You should be telling those customers that use of coprocessors "voids the
> warranty". They are a convenience for HBase project developers and advanced
> users, not a license for random devs to upload code into the server and
> then expect vendor support. It should be obvious on the face of it that is
> not a good idea, and so therefore not why coprocessors are in the HBase
> code in the first place.
> 
> 
> On Sat, May 17, 2014 at 8:02 AM, Kevin O'dell <kevin.od...@cloudera.com>wrote:
> 
>> Andrew,
>> 
>>   HBase-4047 is a great idea(even if it is three years old).  I have had
>> numerous customers implement Co-Procs and take down every RS in a
>> spectacular fashion from JVM crashes to performance crawling so slow that
>> jobs fail out.  I will raise this internally and see if we can get some
>> extra traction.
>> 
>> 
>> On Sat, May 17, 2014 at 9:33 AM, Andrew Purtell <apurt...@apache.org>
>> wrote:
>> 
>>> Great, see HBASE-4047. In the best of the open source tradition, there
>>> hasn't been anyone sufficiently motivated to do the work necessary
>> (current
>>> use cases are "good enough"), but that someone can always come along.
>>> Perhaps that is yourself.
>>> 
>>> 
>>> On Sat, May 17, 2014 at 5:39 AM, Michael Segel <
>> michael_se...@hotmail.com
>>>> wrote:
>>> 
>>>> You have to understand…
>>>> 
>>>> I do see the importance of the hook to allow for a trigger to implement
>>>> 3rd party code on the server side.
>>>> No argument there.
>>>> 
>>>> Its just how the current implementation doesn’t sandbox the code so
>> that
>>>> it limits the potential for harm to the RS.
>>>> 
>>>> In simple terms you can isolate the code in to a separate jvm and use
>> IPC
>>>> to connect the sandbox to the RS when a trigger occurs.
>>>> 
>>>> In C/C++ you’d have shared memory segments, something you don’t really
>>>> have in Java.  (You could use C and then put a JNI wrapper around
>> this…)
>>>> 
>>>> Which goes to my point… this is something that is solvable. You just
>> need
>>>> to think about it…
>>>> 
>>>> You talk about RDBMSs. Triggers themselves are not an equivalent
>> analogy.
>>>> You can have a trigger that then calls some code written in an SPL and
>>>> you’re ok. You can control the SPL environment so that you limit the
>> risk
>>>> of the server crashing.
>>>> (SPL == Stored Procedure Language)
>>>> 
>>>> If you’re running  third party code from your trigger that is written
>> in
>>>> C/C++ or Java, then you have other issues.
>>>> 
>>>> Sybase’s Adaptive Server had some serious issues and a poorly written
>>>> C/C++ code could cause serious performance issues… Informix IDS took a
>>>> different approach and didn’t have those issues.  And I’m aging myself
>>>> because most here probably never worked with either Sybase or Informix
>> …
>>> ;-)
>>>> 
>>>> So using your RDBMS analogy… you have two different approaches. One
>>> worked
>>>> … well enough, but was problematic.  The other worked better and had
>> less
>>>> issues and was more secure.
>>>> 
>>>> One of the reasons why this is important… the longer the current
>>>> implementation is in the wild, the longer and harder it will take to
>> fix.
>>>> 
>>>> 
>>>> On May 17, 2014, at 11:44 AM, qiang tian <tian...@gmail.com> wrote:
>>>> 
>>>>> My small 2 cents...:-)
>>>>> 
>>>>> Hook/coprocessor is useful mechanism to interacting with a system for
>>>>> things that cannot be done via API.  For end user, the tradeoff
>>> factors
>>>>> like performance, security, reliability etc can be control by upper
>>>> layer'
>>>>> policy.
>>>>> e.g. In RDBMS, the end user has limited usage case for triggers,
>> which
>>>>> eliminates the security factor at all, and the performance tradeoff
>> is
>>>>> given to end user to decide. so from evolution's perspective,
>>>>> hook/coprocessor for end user could be controlled by query engine
>> layer
>>>>> like Phoenix.
>>>>> 
>>>>> For internal user, hook better not be used widely unless it is a MUST
>>> or
>>>>> strong flexibility/plugability is required.  e.g. things can be part
>> of
>>>> the
>>>>> core better not use it.
>>>>> 
>>>>> thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, May 17, 2014 at 4:04 PM, Michael Segel <
>>>> michael_se...@hotmail.com>wrote:
>>>>> 
>>>>>> Andrew,
>>>>>> 
>>>>>> Is ‘magical fairy dust’ a reference to some new synthetic drug you
>>> take
>>>> at
>>>>>> raves?
>>>>>> But lets get back to reality.
>>>>>> 
>>>>>> 
>>>>>> Lets try this again; simply put… the coprocessor runs on the same
>> JVM
>>> as
>>>>>> the RS, therefore you have an unacceptable level of risk.
>>>>>> That inherent risk means that you cannot run HBase with end-user
>>>>>> coprocessors enabled when you want to have a stable and somewhat
>>> secure
>>>>>> environment.
>>>>>> 
>>>>>> The simple truth is that you need to decouple the end-user code
>>>>>> (coprocessor) from the RS.
>>>>>> Its not a difficult concept to understand, and while reasonable, it
>>>> would
>>>>>> mean a major rewrite and work done on co-processors.
>>>>>> 
>>>>>> Will de-coupling the user-space from the RS remove all risk? No.
>> And
>>>> no,
>>>>>> I’m not suggesting that.
>>>>>> But its a critical piece to the puzzle.
>>>>>> 
>>>>>> Its not just security, but also reliability.
>>>>>> 
>>>>>> 
>>>>>> On May 17, 2014, at 4:43 AM, Andrew Purtell <apurt...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> Michael,
>>>>>>> 
>>>>>>> As you know, we have implemented security features with
>> coprocessors
>>>>>>> precisely because they can be interposed on internal actions to
>> make
>>>>>>> authoritative decisions in-process. Coprocessors are a way to have
>>>>>>> composable internal extensions. They don't have and probably never
>>> will
>>>>>>> have magic fairy security dust. We do trust the security
>> coprocessor
>>>> code
>>>>>>> because it was developed by the project. That is not the same thing
>>> as
>>>>>>> saying you can have 'security' and execute arbitrary user code
>>>> in-process
>>>>>>> as a coprocessor. Just want to clear that up for you.
>>>>>>> 
>>>>>>>> will want to allow system coprocessors but then write a
>> coprocessor
>>>> that
>>>>>>> reject user coprocessors.
>>>>>>> 
>>>>>>> That's a reasonable point.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, May 17, 2014 at 12:13 AM, Michael Segel
>>>>>>> <michael_se...@hotmail.com>wrote:
>>>>>>> 
>>>>>>>> Until you move the coprocessor out of the RS space and into its
>> own
>>>>>>>> sandbox… saying security and coprocessor in the same sentence is a
>>>> joke.
>>>>>>>> Oh wait… you were serious… :-(
>>>>>>>> 
>>>>>>>> I’d say there’s a significant rethink on coprocessors that’s
>>> required.
>>>>>>>> 
>>>>>>>> Anyone running a secure (kerberos) cluster, will want to allow
>>> system
>>>>>>>> coprocessors but then write a coprocessor that reject user
>>>> coprocessors.
>>>>>>>> 
>>>>>>>> Just putting it out there…
>>>>>>>> 
>>>>>>>> On May 15, 2014, at 2:13 AM, Andrew Purtell <apurt...@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Because coprocessor APIs are so tightly bound with internals, if
>> we
>>>>>> apply
>>>>>>>>> suggested rules like as mentioned on HBASE-11054:
>>>>>>>>> 
>>>>>>>>>   I'd say policy should be no changes to method apis across
>> minor
>>>>>>>>> versions
>>>>>>>>> 
>>>>>>>>> This will lock coprocessor based components to the limitations of
>>> the
>>>>>> API
>>>>>>>>> as we encounter them. Core code does not suffer this limitation,
>> we
>>>> are
>>>>>>>>> otherwise free to refactor and change internal methods. For
>>> example,
>>>> if
>>>>>>>> we
>>>>>>>>> apply this policy to the 0.98 branch, then we will have to
>> abandon
>>>>>>>> further
>>>>>>>>> security feature development there and move to trunk only. This
>> is
>>>>>>>> because
>>>>>>>>> we already are aware that coprocessor APIs as they stand are
>>>>>> insufficient
>>>>>>>>> still.
>>>>>>>>> 
>>>>>>>>> Coprocessor APIs are a special class of internal method. We have
>>> had
>>>> a
>>>>>>>>> tension between allowing freedom of movement for developing them
>>> out
>>>>>> and
>>>>>>>>> providing some measure of stability for implementors for a while.
>>>>>>>>> 
>>>>>>>>> It is my belief that the way forward is something like
>> HBASE-11125.
>>>>>>>> Perhaps
>>>>>>>>> we can take this discussion to that JIRA and have this long
>> overdue
>>>>>>>>> conversation.
>>>>>>>>> 
>>>>>>>>> Regarding security features specifically, I would also like to
>> call
>>>>>> your
>>>>>>>>> attention to HBASE-11127. I think security has been an optional
>>>> feature
>>>>>>>>> long enough, it is becoming a core requirement for the project,
>> so
>>>>>> should
>>>>>>>>> be moved into core. Sure, we can therefore sidestep any issues
>> with
>>>>>>>>> coprocessor API sufficiency for hosting security features.
>> However,
>>>> in
>>>>>> my
>>>>>>>>> opinion we should pursue both HBASE-11125 and HBASE-11127; the
>>> first
>>>> to
>>>>>>>>> provide the relative stability long asked for by coprocessor API
>>>> users,
>>>>>>>> the
>>>>>>>>> latter to cleanly solve emerging issues with concurrency and
>>>>>> versioning.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> 
>>>>>>>>> - Andy
>>>>>>>>> 
>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
>> Piet
>>>>>> Hein
>>>>>>>>> (via Tom White)
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> - Andy
>>>>>>> 
>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein
>>>>>>> (via Tom White)
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> Kevin O'Dell
>> Systems Engineer, Cloudera
>> 
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: On coprocessor API evolution

Reply via email to