Sure, I'll add this option. I'll send a link to final document once it's done.
On Tue, Jul 26, 2016 at 8:06 PM Keys Botzum <kbot...@maprtech.com> wrote: > +1 > > Keys > _______________________________ > Keys Botzum > Senior Principal Technologist > kbot...@maprtech.com <mailto:kbot...@maprtech.com> > 443-718-0098 > MapR Technologies > http://www.mapr.com <http://www.mapr.com/> > > On Jul 26, 2016, at 1:05 PM, yuliya Feldman <yufeld...@yahoo.com.INVALID> > wrote: > > > > I want to make sure (also will make a note in the design doc) that we > have an option to disable dynamic loading/unloading of UDFs until we will > be able to have an ability to do proper authentication AND authorization of > the user(s). > > > > From: Arina Yelchiyeva <arina.yelchiy...@gmail.com <mailto: > arina.yelchiy...@gmail.com>> > > To: dev@drill.apache.org <mailto:dev@drill.apache.org> > > Sent: Monday, July 25, 2016 9:09 AM > > Subject: Re: Dynamic UDFs support > > > > My fault, agree, DROP is more appropriate. > > Thanks Julian! > > > > On Mon, Jul 25, 2016 at 7:07 PM Julian Hyde <jhyde.apa...@gmail.com > <mailto:jhyde.apa...@gmail.com>> wrote: > > > >> But don't call it DELETE. In SQL the opposite of CREATE is DROP. > >> > >> Julian > >> > >>> On Jul 25, 2016, at 8:48 AM, Keys Botzum <kbot...@maprtech.com > <mailto:kbot...@maprtech.com>> wrote: > >>> > >>> I like the approach to handling DELETE. This is very useful. I think an > >> implementation that does not guarantee consistent behavior is perfectly > >> fine for use that is targeted at developers that are working on UDFs. As > >> long as the docs make the intent clear this makes me very happy. > >>> > >>> I'll defer to others more expert than I on the remainder of the design. > >>> > >>> Keys > >>> _______________________________ > >>> Keys Botzum > >>> Senior Principal Technologist > >>> kbot...@maprtech.com <mailto:kbot...@maprtech.com> <mailto: > kbot...@maprtech.com <mailto:kbot...@maprtech.com>> > >>> 443-718-0098 > >>> MapR Technologies > >>> http://www.mapr.com <http://www.mapr.com/> <http://www.mapr.com/ < > http://www.mapr.com/>> > >>>> On Jul 25, 2016, at 9:55 AM, Arina Yelchiyeva < > >> arina.yelchiy...@gmail.com <mailto:arina.yelchiy...@gmail.com>> wrote: > >>>> > >>>> Taking into account all previous comments and discussion we had with > >> Parth > >>>> and Paul, please find below my design notes (I am going to prepare > >> proper > >>>> design document, just want to see if all agree with raw version). > >>>> I propose will use lazy-init to dynamically loaded UDFs, in such case > >> when > >>>> user issues CREATE UDF command, foreman will only validate jar and > >> update > >>>> ZK function registry, and only if function is needed it will be loaded > >> to > >>>> appropriate drillbit (during planning stage or fragment execution). We > >>>> might add listeners (as Paul proposed) to pre-load UDFs but I didn't > >>>> include it to current release to simplify solution but we might > >> re-consider > >>>> this. > >>>> I have looked at issue with class loading and unloading and if we ship > >> each > >>>> jar with its own classloader, DELETE functionality can be introduced > in > >>>> current release, at least marked as experimental or for developers use > >>>> only, to ease UDF development process. > >>>> > >>>> Any comments are welcomed. > >>>> > >>>> *Invariants* > >>>> > >>>> 1. DFS staging area where user copies jar to be loaded > >>>> > >>>> 2. DFS udf area (former registration area) where all validated jars > are > >>>> present > >>>> > >>>> 3. ZK function registry - contains list of all dynamically loaded UDFs > >> and > >>>> their jars. UDF name will be represented as combination of name and > >> input > >>>> parameters. > >>>> > >>>> 4. Lazy-init - all dynamically loaded UDFs will be loaded to drillbit > >> upon > >>>> request, i.e. if drillbits receives query or fragment that contains > >> such UDF > >>>> > >>>> 5. Currently only CREATE and DELETE statements are supported > >>>> > >>>> > >>>> *Adding UDFs* > >>>> > >>>> 1. User copies source and binary (hereinafter jar) to DFS staging area > >>>> 2. User issues CREATE UDF command > >>>> 3. Foreman receives request to create UDF: > >>>> a) checks if jar is present in staging area > >>>> b) copies jar to temporary DFS location > >>>> c) validates UDFs present in jar locally: > >>>> 1) copies jar to temporary local fs > >>>> 2) scans jar using temporary classloader > >>>> 3) checks if there are any duplicates in local function registry > >>>> 4) returns list of UDFs to be registered > >>>> d) validates UDFs present in jar in ZK: > >>>> 1) takes list of dynamically loaded UDFs from ZK > >>>> 2) checks if there are no duplicates either by jar name or among UDFs > >>>> 3) moves jar from DFS temporary area to DFS udf area > >>>> 4) updates ZK with list of new dynamic UDFs > >>>> 5) removes jar from staging area > >>>> 6) returns confirmation to user that UDFs were registered > >>>> > >>>> > >>>> *Lazy-init* > >>>> > >>>> 1. User issues query with dynamically loaded UDF. > >>>> > >>>> 2. During planning stage or fragment execution, if UDF is not present > in > >>>> local function registry, drillbit: > >>>> > >>>> a) checks if such UDF is present in ZK function registry > >>>> > >>>> b) if present, loads UDF using jar name, otherwise return an error > >>>> > >>>> c) proceeds planning stage or fragment execution > >>>> > >>>> > >>>> *New drillbit registration / Drillbit re-start* > >>>> > >>>> Local udf directory is re-created, to clean up previously loaded jars > >> if any > >>>> > >>>> > >>>> *Delete UDF* > >>>> > >>>> Each jar that going to be loaded dynamically will have its own > >> classloader > >>>> which will solve problem with loading and unloading classes with the > >> same > >>>> name. > >>>> > >>>> > >>>> 1. User issues DELETE command (delete will operate on jar name level) > >>>> > >>>> 2. Foreman receives DELETE request: > >>>> > >>>> a) checks if such jar is present in ZK function registry > >>>> > >>>> b) creates ephemeral znode /udf/delete/jar_name > >>>> > >>>> c) removes record in ZK function registry > >>>> > >>>> d) removes jar from DFS udf area > >>>> > >>>> e) removes ephemeral znode from /udf/delete/jar_name > >>>> > >>>> f) returns confirmation to user that UDFs were deleted > >>>> > >>>> 3. Drillbits are subscribed to /udf/delete znode, when new znode with > >> jar > >>>> name appears, drillbit: > >>>> > >>>> a) removes all UDFs associated with jar name from local function > >> registry > >>>> > >>>> b) removes jar from local udf directory > >>>> > >>>> > >>>> *Limitations* > >>>> > >>>> 1. When user runs DELETE command, some queries that are using deleted > >> UDFs > >>>> may fail during fragment execution if by that time UDF has been > deleted > >>>> from local registry. Ideally, before submitting DELETE command, user > >> needs > >>>> to make sure, no one is running queries using UDFs from that > particular > >> jar. > >>>> > >>>> > >>>> 2. We encourage users not to delete any jars from DFS udf area > >> manually, as > >>>> it may lead to inconsistency between ZK function registry and DFS udf > >> area. > >>>> > >>>> > >>>> 3. CREATE statement is not atomic in part when we copy validated jar > to > >> DFS > >>>> udf area and updating ZK function registry with list of new UDFs. In > >> case > >>>> of failure between these two steps, some unused jars may be left in > DFS > >> udf > >>>> area but they won’t harm current process. LIST JARS command can be > >>>> introduced to show used jars. > >>>> > >>>> > >>>> Kind regards > >>>> Arina > >>>> > >>>>> On Fri, Jul 22, 2016 at 7:15 PM Keys Botzum <kbot...@maprtech.com > <mailto:kbot...@maprtech.com>> > >> wrote: > >>>>> > >>>>> No disagreement on deferral but I raised my initial concern precisely > >>>>> because I'm concerned about the practicality of the "restart the > >> cluster" > >>>>> option. I sighted my concerns about laptops and development > >> clusters. I > >>>>> was wondering if there might be some small things Drill could do to > >> help. > >>>>> If there is nothing that can be done to make this easier, so be it, > >> but I > >>>>> think that's going to be a big impedance. > >>>>> > >>>>> Keys > >>>>> _______________________________ > >>>>> Keys Botzum > >>>>> Senior Principal Technologist > >>>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com> <mailto: > kbot...@maprtech.com <mailto:kbot...@maprtech.com>> > >>>>> 443-718-0098 > >>>>> MapR Technologies > >>>>> http://www.mapr.com <http://www.mapr.com/> <http://www.mapr.com/ < > http://www.mapr.com/>> > >>>>>>> On Jul 22, 2016, at 1:37 AM, Neeraja Rentachintala < > >>>>>> nrentachint...@maprtech.com <mailto:nrentachint...@maprtech.com>> > wrote: > >>>>>> > >>>>>> It seems like we are reaching a conclusion here in terms of starting > >>>>> with a > >>>>>> simpler implementation i.e being able to deploy UDFs dynamically > >> without > >>>>>> Drillbit restarts based off a jars in DFS location. Dropping > >> functions > >>>>>> dynamically is out of scope for version 1 of this feature (we assume > >>>>>> development of UDFs is happening on user laptop or a dev cluster > where > >>>>> its > >>>>>> ok to have restart). > >>>>>> > >>>>>> -Neeraja > >>>>>> > >>>>>>> On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum < > kbot...@maprtech.com <mailto:kbot...@maprtech.com>> > >>>>>> wrote: > >>>>>> > >>>>>>> Recognize the difficulty. Not suggesting this be addressed in first > >>>>>>> version. Just suggesting some thought about how a real user will > >>>>>>> workaround. Maybe some doc and/or small changes can make this > easier. > >>>>>>> > >>>>>>> Keys > >>>>>>> _______________________________ > >>>>>>> Keys Botzum > >>>>>>> Senior Principal Technologist > >>>>>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com> > >>>>>>> 443-718-0098 > >>>>>>> MapR Technologies > >>>>>>> http://www.mapr.com > >>>>>>>> On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> > >> wrote: > >>>>>>>> > >>>>>>>> Hi All, > >>>>>>>> > >>>>>>>> Adding a dynamic DROP would, of course, be a great addition! The > >> reason > >>>>>>>> for suggesting we skip that was to control project scope. > >>>>>>>> > >>>>>>>> Dynamic DROP requires a synchronization step. Here’s the scenario: > >>>>>>>> > >>>>>>>> * Foreman A starts a query using UDF U. > >>>>>>>> * Foreman B receives a request to drop UDF U, followed by a > request > >> to > >>>>>>> add > >>>>>>>> a new version of U, U’. > >>>>>>>> > >>>>>>>> How do we drop a function that may be in use? There are some > tricky > >>>>> bits > >>>>>>>> to work out, which seemed too overwhelming to consider all in one > >> go. > >>>>>>>> > >>>>>>>> Clearly just dropping U and adding a new version of U with the > same > >>>>> name > >>>>>>>> leads to issues if not synchronized. If a Drillbit D is running a > >> query > >>>>>>>> with U when it receives notice to drop U, should D complete the > >> query > >>>>> or > >>>>>>>> fail it? If the query completes, then how does D deal with the > >> request > >>>>> to > >>>>>>>> register U’, which has the same name? > >>>>>>>> > >>>>>>>> Do we globally synchronize function deletion? (The foreman B that > >>>>>>> receives > >>>>>>>> the drop request waits for all queries using U to finish.) But, > how > >> do > >>>>> we > >>>>>>>> know which queries use U? > >>>>>>>> > >>>>>>>> An eventually consistent approach is to track the age of the > oldest > >>>>>>>> running query. Suppose B drops U at time T. Any query received > >> after T > >>>>>>> that > >>>>>>>> uses U will fail in planning. A new U’ can’t be registered until > all > >>>>>>>> queries that started before T complete. > >>>>>>>> > >>>>>>>> The primary challenge we face in both the CREATE and DROP cases is > >> that > >>>>>>>> Drill is distributed with little central coordination. That’s > great > >> for > >>>>>>>> scale, but makes it hard to design features that require > >> coordination. > >>>>>>> Some > >>>>>>>> other tools solve this problem with a data dictionary (or > >> “metastore"). > >>>>>>>> Alas, Drill does not have such a concept. So a seemingly simple > >> feature > >>>>>>>> like dynamic UDF becomes a major design challenge to get right. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> - Paul > >>>>>>>> > >>>>>>>>>> On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala < > >>>>>>>>> nrentachint...@maprtech.com> wrote: > >>>>>>>>> > >>>>>>>>> The whole point of this feature is to avoid Drill cluster > restarts > >> as > >>>>>>> the > >>>>>>>>> name indicates 'Dynamic' UDFs. > >>>>>>>>> So any design that requires restarts I would think would beat the > >>>>>>>> purpose. > >>>>>>>>> > >>>>>>>>> I also think this is an example of a feature we start with a > simple > >>>>>>>> design > >>>>>>>>> to serve the purpose, take feedback on how it is being > >> deployed/used > >>>>> in > >>>>>>>>> real user situations and improve it in subsequent releases. > >>>>>>>>> > >>>>>>>>> -thanks > >>>>>>>>> Neeraja > >>>>>>>>> > >>>>>>>>>> On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum < > >> kbot...@maprtech.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> I think there are a lot of great ideas here. My one concern is > the > >>>>>>> lack > >>>>>>>> of > >>>>>>>>>> unload and thus presumably replace functionality. I'm just > >> thinking > >>>>>>>> about > >>>>>>>>>> typical actual usage. > >>>>>>>>>> > >>>>>>>>>> In a typical development cycle someone writes something, tries > it, > >>>>>>>> learns, > >>>>>>>>>> changes it, and tries again. Assuming I understand the design > that > >>>>>>>> change > >>>>>>>>>> step requires a full Drill cluster restart. That is going to be > >> very > >>>>>>>>>> disruptive and will make UDF work nearly impossible without a > >>>>>>> dedicated > >>>>>>>>>> "private" cluster for Drill. I realize that people should have > >> access > >>>>>>> to > >>>>>>>>>> the data they need and Drill in a development cluster but even > >> then > >>>>>>>>>> restarts can be hard since development clusters are often > shared - > >>>>> and > >>>>>>>>>> that's assuming such a cluster exists. I realize of course Drill > >> can > >>>>>>> be > >>>>>>>> run > >>>>>>>>>> as a standalone Drillbit but I'm not convinced that desktops > will > >>>>> have > >>>>>>>>>> adequate access to the needed data. > >>>>>>>>>> > >>>>>>>>>> Having dealt with Java classloading over the years, I'm not > >> claiming > >>>>>>>> class > >>>>>>>>>> replacement is an easy thing so I'll defer to others on the > >> priority > >>>>>>> of > >>>>>>>>>> that, but I'm wondering if there isn't some way to make UDF > >>>>>>>> experimentation > >>>>>>>>>> a bit easier/practical. > >>>>>>>>>> > >>>>>>>>>> Given the above, let me toss out some possibly naive ideas that > >> maybe > >>>>>>>> are > >>>>>>>>>> workable: > >>>>>>>>>> * can I easily run a standalone Drillbit on a Hadoop cluster > node > >>>>> that > >>>>>>>> is > >>>>>>>>>> already running Drill servers? I'm sure this can be done, but is > >> it > >>>>>>>> easy? > >>>>>>>>>> Could we perhaps make this clearer as an explicit kind of thing? > >>>>>>>>>> * is there a way that when I deploy a UDF I can constrain the # > of > >>>>>>> bits > >>>>>>>> it > >>>>>>>>>> is loaded into and perhaps even specify the bits? > >>>>>>>>>> * Obvious correlarary is I'd want my query to run on those bits > >> and a > >>>>>>>>>> not too disruptive way to restart just those bits > >>>>>>>>>> > >>>>>>>>>> The above may be obvious to Drill experts. If it is then perhaps > >> the > >>>>>>> UDF > >>>>>>>>>> docs could just point out how to easily develop UDFs in an > >> iterative > >>>>>>>>>> fashion. > >>>>>>>>>> > >>>>>>>>>> Keys > >>>>>>>>>> _______________________________ > >>>>>>>>>> Keys Botzum > >>>>>>>>>> Senior Principal Technologist > >>>>>>>>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com> > >>>>>>>>>> 443-718-0098 > >>>>>>>>>> MapR Technologies > >>>>>>>>>> http://www.mapr.com <http://www.mapr.com/> > >