It seems like we are reaching a conclusion here in terms of starting with a simpler implementation i.e being able to deploy UDFs dynamically without Drillbit restarts based off a jars in DFS location. Dropping functions dynamically is out of scope for version 1 of this feature (we assume development of UDFs is happening on user laptop or a dev cluster where its ok to have restart).
-Neeraja On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum <kbot...@maprtech.com> wrote: > Recognize the difficulty. Not suggesting this be addressed in first > version. Just suggesting some thought about how a real user will > workaround. Maybe some doc and/or small changes can make this easier. > > Keys > _______________________________ > Keys Botzum > Senior Principal Technologist > kbot...@maprtech.com > 443-718-0098 > MapR Technologies > http://www.mapr.com > On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> wrote: > > > Hi All, > > > > Adding a dynamic DROP would, of course, be a great addition! The reason > > for suggesting we skip that was to control project scope. > > > > Dynamic DROP requires a synchronization step. Here’s the scenario: > > > > * Foreman A starts a query using UDF U. > > * Foreman B receives a request to drop UDF U, followed by a request to > add > > a new version of U, U’. > > > > How do we drop a function that may be in use? There are some tricky bits > > to work out, which seemed too overwhelming to consider all in one go. > > > > Clearly just dropping U and adding a new version of U with the same name > > leads to issues if not synchronized. If a Drillbit D is running a query > > with U when it receives notice to drop U, should D complete the query or > > fail it? If the query completes, then how does D deal with the request to > > register U’, which has the same name? > > > > Do we globally synchronize function deletion? (The foreman B that > receives > > the drop request waits for all queries using U to finish.) But, how do we > > know which queries use U? > > > > An eventually consistent approach is to track the age of the oldest > > running query. Suppose B drops U at time T. Any query received after T > that > > uses U will fail in planning. A new U’ can’t be registered until all > > queries that started before T complete. > > > > The primary challenge we face in both the CREATE and DROP cases is that > > Drill is distributed with little central coordination. That’s great for > > scale, but makes it hard to design features that require coordination. > Some > > other tools solve this problem with a data dictionary (or “metastore"). > > Alas, Drill does not have such a concept. So a seemingly simple feature > > like dynamic UDF becomes a major design challenge to get right. > > > > Thanks, > > > > - Paul > > > > > On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala < > > nrentachint...@maprtech.com> wrote: > > > > > > The whole point of this feature is to avoid Drill cluster restarts as > the > > > name indicates 'Dynamic' UDFs. > > > So any design that requires restarts I would think would beat the > > purpose. > > > > > > I also think this is an example of a feature we start with a simple > > design > > > to serve the purpose, take feedback on how it is being deployed/used in > > > real user situations and improve it in subsequent releases. > > > > > > -thanks > > > Neeraja > > > > > > On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com> > > wrote: > > > > > >> I think there are a lot of great ideas here. My one concern is the > lack > > of > > >> unload and thus presumably replace functionality. I'm just thinking > > about > > >> typical actual usage. > > >> > > >> In a typical development cycle someone writes something, tries it, > > learns, > > >> changes it, and tries again. Assuming I understand the design that > > change > > >> step requires a full Drill cluster restart. That is going to be very > > >> disruptive and will make UDF work nearly impossible without a > dedicated > > >> "private" cluster for Drill. I realize that people should have access > to > > >> the data they need and Drill in a development cluster but even then > > >> restarts can be hard since development clusters are often shared - and > > >> that's assuming such a cluster exists. I realize of course Drill can > be > > run > > >> as a standalone Drillbit but I'm not convinced that desktops will have > > >> adequate access to the needed data. > > >> > > >> Having dealt with Java classloading over the years, I'm not claiming > > class > > >> replacement is an easy thing so I'll defer to others on the priority > of > > >> that, but I'm wondering if there isn't some way to make UDF > > experimentation > > >> a bit easier/practical. > > >> > > >> Given the above, let me toss out some possibly naive ideas that maybe > > are > > >> workable: > > >> * can I easily run a standalone Drillbit on a Hadoop cluster node that > > is > > >> already running Drill servers? I'm sure this can be done, but is it > > easy? > > >> Could we perhaps make this clearer as an explicit kind of thing? > > >> * is there a way that when I deploy a UDF I can constrain the # of > bits > > it > > >> is loaded into and perhaps even specify the bits? > > >> * Obvious correlarary is I'd want my query to run on those bits and a > > >> not too disruptive way to restart just those bits > > >> > > >> The above may be obvious to Drill experts. If it is then perhaps the > UDF > > >> docs could just point out how to easily develop UDFs in an iterative > > >> fashion. > > >> > > >> Keys > > >> _______________________________ > > >> Keys Botzum > > >> Senior Principal Technologist > > >> kbot...@maprtech.com <mailto:kbot...@maprtech.com> > > >> 443-718-0098 > > >> MapR Technologies > > >> http://www.mapr.com <http://www.mapr.com/> > > >>> On Jul 21, 2016, at 3:13 AM, Paul Rogers <prog...@maprtech.com> > wrote: > > >>> > > >>> Always good to have options… Another is to try an eventual > consistency > > >> model. > > >>> > > >>> The invariant here is the one that was mentioned earlier. Whenever a > > >> query is submitted with UDF U, that query either fails in planning > > (because > > >> U is unknown) or succeeds on all nodes (at least with respect to U.) > > >>> > > >>> For this to work, we need a constant view of the world. We can try to > > >> enforce consistency at function registration time (the original > > design), or > > >> via the Foreman (Parth’s design.) We can probably also use an eventual > > >> consistency model. > > >>> > > >>> Suppose we have a global name space of functions. With the global > name > > >> space, we can establish this invariant: If a function is in that name > > >> space, then the Foreman accepts the query. If a Drillbit receives a > > >> fragment, but does not yet know of U, then the Drillbit A) knows that > > some > > >> foreman must have registered U (or the query would have failed in > > planning) > > >> and B) the Drillbit can download the function if not already in place. > > >>> > > >>> Folks pointed out that always checking a global name space is > > expensive, > > >> which it is. As it turns out, we can first check the local function > > >> registry. If the Drillbit already knows about the function, we’re done > > >> checking, no global check needed. It is only on the first use of a new > > >> function, when it is not yet loaded locally, that the global check > must > > be > > >> done. > > >>> > > >>> For this to work the foreman that registers UDF U must: > > >>> > > >>> 1. From Arina’s proposed staging area, check the jar contents to see > if > > >> a name conflict exists with the global registry. (Requires some class > > >> loader code.) > > >>> 2. If a conflict exists, refuse to register the function and return > an > > >> error. > > >>> 3. If no conflict exists, register the function in the global name > > space > > >> and move the jar to the registered area in DFS. > > >>> > > >>> In this model, it is entirely optional whether the foreman that > > >> registers U alerts other Drillbits. Instead, Drillbits could poll from > > time > > >> to time, or just wait until they see a query with U and do the > download > > at > > >> that time. > > >>> > > >>> When a new Drillbit starts, it can load all functions in the registry > > >> area because these have all passed the name collision test and can all > > be > > >> used in queries. Any new registrations will be found and loaded as > > above. > > >> (It is not required to preload functions, but it might help > > performance.) > > >>> > > >>> ZK is the only place we have at present for the global name space, so > > >> that seems the logical tool. ZK allows atomic operations, which we > need > > >> here. Operations 1, 2, and 3 above should be atomic. > > >>> > > >>> Unfortunately, we can’t do the DFS move atomically with a ZK name > space > > >> insertion. So, the global name check & insert should be atomic. If > that > > >> succeeds, copy the jar into the registered folder. There are a few > > details > > >> to work out to handle special cases, but we can cover those another > > time. > > >> (Hint: what happens if the Foreman crashes after insetting the ZK > entry > > but > > >> before moving the jar?) > > >>> > > >>> None of the proposed designs permit graceful unloading of functions. > > So, > > >> deleting functions will require a cluster restart to establish a new > > stable > > >> checkpoint. > > >>> > > >>> We can recommend that on each cluster restart, any functions in the > DFS > > >> registry be copied to each Drillbit (much easier with the coming YARN > > >> integration) as a way of keeping the DFS registry a reasonable size. > > >>> > > >>> More details to work out, but that’s the gist of the concept. > > >>> > > >>> Thanks, > > >>> > > >>> - Paul > > >>> > > >>>> On Jul 20, 2016, at 2:37 PM, Parth Chandra <pchan...@maprtech.com> > > >> wrote: > > >>>> > > >>>> My notes from the hangout with Arina and Paul - > > >>>> > > >>>> Notes - > > >>>> > > >>>> There are two invariants for the registration process - > > >>>> 1) There is a registration/validated directory in the DFS that > > contains > > >>>> UDFS that have been validated by the registering foreman. All > > drillbits > > >>>> will have access to this directory and on startup and/or UDF > > >> registration, > > >>>> the jars in this directory are sync'd up with a local UDF directory > > >>>> 2) During the process of registration, the registering foreman > > creates a > > >>>> Zookeeper node that indicates that one or more drillbits has not yet > > >>>> registered the UDF. > > >>>> > > >>>> The basic workflow is that UDF jars are copied from the staging > > >> directory > > >>>> to the registration directory and validated. Once they are > validated, > > >> the > > >>>> available drillbits are told to register the UDF. Registering the > UDF > > >>>> consists of copying the node to a local UDF directory and updating > the > > >>>> local (in-memory) udf registry. A sentinel node in zookeeper is used > > to > > >>>> track when all the drillbits have registered the UDF. > > >>>> > > >>>> There were two main suggestions : Immediate registration and lazy > > >>>> registration, > > >>>> > > >>>> Immediate registration - > > >>>> Foreman tells all drillbits to register. Creates a Zookeeper node to > > >>>> track. > > >>>> Every drillbit makes a local copy and updates zookeeper node to show > > it > > >>>> is done. > > >>>> Foreman checks the zookeeper node and when all available drillbits > > have > > >>>> acknowledged, sends a message to all drillbits to complete > > registration. > > >>>> Foreman removes ZK node. > > >>>> All Drillbits update their local UDF registry > > >>>> Drillbit startup will block if there is a ZK node indicating > > >>>> registration is in progress. > > >>>> This approach needs to be validated to see if any race conditions > > >> exist. > > >>>> > > >>>> Lazy registration > > >>>> Once a UDF is copied to the registration folder, the UDF is > > essentially > > >>>> registered. On first use, a drillbit may hit a classnotfound > exception > > >> in > > >>>> which case it will look for the UDF in the registration directory. > If > > >>>> found, it will copy to the local directory and add the UDF to it's > > local > > >>>> registry. > > >>>> This approach should be investigated to see if it fits in with the > > >>>> current UDF execution code. > > >>>> > > >>>> > > >>>> On Mon, Jul 18, 2016 at 3:36 PM, Parth Chandra < > pchan...@maprtech.com > > > > > >>>> wrote: > > >>>> > > >>>>> +1 on simplifying the design and postpone the items Paul has > > suggested. > > >>>>> > > >>>>> Arina, Paul, I think we need to work out some of the design related > > to > > >>>>> registering the UDF. Are you guys open for a quick hangout @10 a.m > > PDT > > >>>>> tomorrow? > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com > > > > >> wrote: > > >>>>> > > >>>>>> Hi All, > > >>>>>> > > >>>>>> We’ve had quite a lively debate in the “comments” section of > Arina’s > > >>>>>> wonderful design doc. Zelaine made a great suggestion: summarize > the > > >> user > > >>>>>> experience as a way of making sense of the wealth of detailed > > >> comments. > > >>>>>> > > >>>>>> IMHO, the most important user experience goals are: > > >>>>>> > > >>>>>> 1. When a user submits a CREATE FUNCTION command, the command > > returns > > >>>>>> quickly (within a few seconds at most.) > > >>>>>> 2. If the above user then issues a query using that function (to > the > > >> same > > >>>>>> Foreman), that query is guaranteed to successfully use the new > > >> function on > > >>>>>> all nodes. > > >>>>>> 3. Other users, connecting to any Foreman will see a very clean > > >> behavior > > >>>>>> when submitting a query with the new function. Before some point > in > > >> time > > >>>>>> (can be different for each Foreman), a query with the function > fails > > >> in > > >>>>>> planning. After that point, queries are guaranteed to successfully > > >> use the > > >>>>>> new function on all nodes. > > >>>>>> > > >>>>>> Basically, this says that CREATE FUNCTION can’t (potentially) > take a > > >> long > > >>>>>> time. Use of functions can’t result in random failures during the > > >> time that > > >>>>>> the function is propagated across Drillbits. > > >>>>>> > > >>>>>> The goals we can perhaps postpone are: > > >>>>>> > > >>>>>> 1. Class name space isolation. (Allows two data scientists to > define > > >> the > > >>>>>> same class without collisions.) > > >>>>>> 2. Function name spaces. (Allows me to define “paul.foo” and you > to > > >>>>>> define “bob.foo” with out collisions. (Needed if many people > develop > > >>>>>> functions independently. Else, we need a global name space.) > > >>>>>> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, > and > > it > > >>>>>> requires unloading classes and name space cleanup.) (Just let the > > >> cleanup > > >>>>>> happen offline.) > > >>>>>> 4. Dependency jars (e.g. third party libraries, etc.) (We require > > >> those > > >>>>>> to be statically added to the class path before Drill starts.) > > >>>>>> > > >>>>>> We are not creating per-user name spaces, or allowing people to > use > > >>>>>> production clusters to try/revise functions. We’re just sampling > > >> deployment > > >>>>>> of simple functions. > > >>>>>> > > >>>>>> That’s my suggestion, what do others suggest? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> - Paul > > >>>>>> > > >>>>>>> On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva < > > >>>>>> arina.yelchiy...@gmail.com> wrote: > > >>>>>>> > > >>>>>>> I also agree on using Zookeeper. I have re-worked dynamic UDF > > support > > >>>>>>> document taking into account Zookeeper usage. > > >>>>>>> > > >>>>>>> Link to the document - > > >>>>>>> > > >>>>>> > > >> > > > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > > >>>>>>> > > >>>>>>> Kind regards > > >>>>>>> Arina > > >>>>>>> > > >>>>>>> On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers < > prog...@maprtech.com > > > > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Great idea! We already use ZK to track storage plugins. ZK is > > >> perhaps > > >>>>>>>> better suited to register each jar and/or function that using > > files > > >> in > > >>>>>> DFS. > > >>>>>>>> Still need to work out the proper sequencing. But you are right, > > >> this > > >>>>>> is > > >>>>>>>> the kind of thing that ZK is supposed to solve. > > >>>>>>>> > > >>>>>>>> - Paul > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org> > > >> wrote: > > >>>>>>>>> > > >>>>>>>>> Reading thru some of Paul's comments on maintaining a > consistent > > >> state > > >>>>>>>> for > > >>>>>>>>> the registration of the UDF, it looks like we need a consensus > > >>>>>> protocol > > >>>>>>>> for > > >>>>>>>>> determining that all the Drillbits have the UDF deployed. > > >>>>>>>>> I believe Zookeeper can provide a stronger guarantee than a 2 > > phase > > >>>>>>>>> approach. Should we look into that? > > >>>>>>>>> > > >>>>>>>>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva < > > >>>>>>>>> arina.yelchiy...@gmail.com> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi all! > > >>>>>>>>>> > > >>>>>>>>>> I have updated design document. > > >>>>>>>>>> Main changes: > > >>>>>>>>>> 1. Add to Drill’s config цшер the staging and registration > DFS > > >>>>>>>> locations. > > >>>>>>>>>> 2. User is no longer is responsible for copying jars into > > drillbit > > >>>>>>>> nodes. > > >>>>>>>>>> Now user needs to copy jars into staging DFS location from > where > > >>>>>>>> drillbits > > >>>>>>>>>> will copy them to local fs. > > >>>>>>>>>> 2. During UDFs registration jars will be moved to DFS > > registration > > >>>>>> area. > > >>>>>>>>>> 3. During start up drillbit will copy all jars from > registration > > >>>>>> area, > > >>>>>>>> so > > >>>>>>>>>> newly added drillbit will have all UDFs as others. > > >>>>>>>>>> 4. Security issues - probably they will be added later as > > >>>>>> enhancement. > > >>>>>>>>>> > > >>>>>>>>>> More detains in the document: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> > > >> > > > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > > >>>>>>>>>> > > >>>>>>>>>> Kind regards > > >>>>>>>>>> Arina > > >>>>>>>>>> > > >>>>>>>>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers < > > prog...@maprtech.com > > >>> > > >>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi All, > > >>>>>>>>>>> > > >>>>>>>>>>> To answer Arina on item 3: there is actually no good location > > on > > >> any > > >>>>>>>>>> local > > >>>>>>>>>>> node to put the UDFs. Reason: DoY allows the admin to start a > > >>>>>> Drillbit > > >>>>>>>> on > > >>>>>>>>>>> any available node. When it starts, a new, fresh copy of > Drill > > >> will > > >>>>>> be > > >>>>>>>>>>> downloaded, and this can happen after the user issued the > > CREATE > > >>>>>>>> command. > > >>>>>>>>>>> > > >>>>>>>>>>> What we need is a shared, secure distributed storage location > > >> from > > >>>>>>>> which > > >>>>>>>>>>> Drillbits can download the needed jar files. Something like… > > DFS! > > >>>>>>>> Indeed, > > >>>>>>>>>>> this is how YARN stores the Drill archive from which it > creates > > >> the > > >>>>>>>> Drill > > >>>>>>>>>>> install directory on each node. We can’t quite use YARN’s > > >> mechanism > > >>>>>>>> (YARN > > >>>>>>>>>>> is aware only of the files uploaded when launching an app), > but > > >> we > > >>>>>> can > > >>>>>>>> do > > >>>>>>>>>>> something similar. > > >>>>>>>>>>> > > >>>>>>>>>>> So, brainstorming a bit… > > >>>>>>>>>>> > > >>>>>>>>>>> 1. Store the UDF jar in a pre-defined DFS location. > > >>>>>>>>>>> > > >>>>>>>>>>> 2. The CREATE function 1) uploads the jar to the DFS > location, > > >> and > > >>>>>> 2) > > >>>>>>>>>>> creates some kind of registry entry. > > >>>>>>>>>>> > > >>>>>>>>>>> 3. The DELETE function 1) deregisters the jar (and function), > > >> but 2) > > >>>>>>>> does > > >>>>>>>>>>> not delete the jar (this allows in-flight queries to > complete.) > > >>>>>>>>>>> > > >>>>>>>>>>> 3. Drillbits periodically check DFS for changed > registrations, > > >>>>>>>>>> downloading > > >>>>>>>>>>> any needed jars. (YARN, Spark, Storm and others already do > > >> something > > >>>>>>>>>>> similar.) > > >>>>>>>>>>> > > >>>>>>>>>>> 4. Registry check is “forced” when processing a query with a > > >>>>>> function > > >>>>>>>>>> that > > >>>>>>>>>>> is not currently registered. (Doing so resolves any possible > > race > > >>>>>>>>>>> conditions.) > > >>>>>>>>>>> > > >>>>>>>>>>> 5. Some process (perhaps time based) removes old, > unregistered > > >> jar > > >>>>>>>> files. > > >>>>>>>>>>> (Or, we could get fancy and use reference counts. The > reference > > >>>>>> count > > >>>>>>>>>> would > > >>>>>>>>>>> be required if the user wants to delete, then recreate, the > > same > > >>>>>>>> function > > >>>>>>>>>>> and jar to avoid conflict with in-flight queries.) > > >>>>>>>>>>> > > >>>>>>>>>>> We can build security on this as follows: > > >>>>>>>>>>> > > >>>>>>>>>>> 1. Define permissions for who can write to the DFS location. > > Or, > > >>>>>>>> indeed, > > >>>>>>>>>>> have subdirectories by user and grant each user permission > only > > >> on > > >>>>>>>> their > > >>>>>>>>>>> own UDF directory. > > >>>>>>>>>>> > > >>>>>>>>>>> 2. Provide separate registries for per-user functions > (private) > > >> and > > >>>>>>>>>> global > > >>>>>>>>>>> functions (public). Only the admin can add global functions. > > But, > > >>>>>> only > > >>>>>>>>>> the > > >>>>>>>>>>> user that uploads a private function can use it. > > >>>>>>>>>>> > > >>>>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their > own > > >> name > > >>>>>>>> space > > >>>>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call > > >> into a > > >>>>>>>> UDF, > > >>>>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow > Drill > > >>>>>> classes > > >>>>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t > > clash > > >>>>>> with > > >>>>>>>>>> your > > >>>>>>>>>>> function Foo if both are private. > > >>>>>>>>>>> > > >>>>>>>>>>> Sorry that this has wandered a bit far from the original > simple > > >>>>>> design, > > >>>>>>>>>>> but the above may capture much of what folks expect in modern > > >>>>>>>> distributed > > >>>>>>>>>>> big data systems. > > >>>>>>>>>>> > > >>>>>>>>>>> I wonder if a good next step might be to review the notes in > > the > > >>>>>> design > > >>>>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a > > >> summary > > >>>>>> of > > >>>>>>>>>>> technical requirements, and a proposed design. Postpone, at > > least > > >>>>>> for > > >>>>>>>>>> now, > > >>>>>>>>>>> concerns about the amount of work; we can worry about that > once > > >>>>>> folks > > >>>>>>>>>> agree > > >>>>>>>>>>> on your revised design. > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks, > > >>>>>>>>>>> > > >>>>>>>>>>> - Paul > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva < > > >>>>>>>>>>> arina.yelchiy...@gmail.com> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 4. Authorization model mentioned by Julia and John > > >>>>>>>>>>>> If user won't have rights to copy jars to UDF classpath, > which > > >> can > > >>>>>> be > > >>>>>>>>>>>> restricted by file system, he won't be able to do much harm > by > > >>>>>> running > > >>>>>>>>>>>> CREATE command. If UDFs from jar were already registered, > > CREATE > > >>>>>>>>>>> statement > > >>>>>>>>>>>> will fail. CREATE OR REPLACE will just re-register UDFs. > > >>>>>>>>>>>> But DELETE command is not safe. If user knows jar name, he > can > > >>>>>> delete > > >>>>>>>>>> all > > >>>>>>>>>>>> associated with it UDFs, as well as the binary and source > > jars. > > >>>>>> That's > > >>>>>>>>>>>> where we'll probably need to impose restrictions. > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva < > > >>>>>>>>>>> arina.yelchiy...@gmail.com> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> 1. DELETE command - I missed to indicate it document but > had > > it > > >>>>>> in my > > >>>>>>>>>>>>> mind. When user issues DELETE command, all UDF associated > > with > > >>>>>>>>>> indicated > > >>>>>>>>>>>>> jar is removed from DrillFunctionRegistry. And then binary > > and > > >>>>>> source > > >>>>>>>>>>>>> files are also deleted from UDF classpath. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 2. Distribution race condition described by Paul > > >>>>>>>>>>>>> User issues CREATE command and gets confirmation that UDFs > is > > >>>>>>>>>> registered > > >>>>>>>>>>>>> only if all drilllbits have confirmed that registration was > > >>>>>>>>>> successful. > > >>>>>>>>>>>>> I don't expect user to start using UDFs in queries prior to > > >> CREATE > > >>>>>>>>>>> command > > >>>>>>>>>>>>> success / failure result, which is possible but strange. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 3. DoY > > >>>>>>>>>>>>> @Paul > > >>>>>>>>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly > we > > >> use > > >>>>>>>>>>>>> $DRILL_UDF environment variable which will be set during > > >> drillbit > > >>>>>>>>>> start > > >>>>>>>>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable > will > > be > > >>>>>> added > > >>>>>>>>>> to > > >>>>>>>>>>>>> Drill classpath during start. > > >>>>>>>>>>>>> Will it ease DoY integration somehow? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Kind regards > > >>>>>>>>>>>>> Arina > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman > > >>>>>>>>>>> <yufeld...@yahoo.com.invalid> > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Just thoughts: > > >>>>>>>>>>>>>> You can try to reuse distributed cache Let Drill AM do the > > >>>>>> needful > > >>>>>>>> in > > >>>>>>>>>>>>>> terms of orchestrating UDF jars distribution. > > >>>>>>>>>>>>>> But > > >>>>>>>>>>>>>> I would be inclined to have a common path that is > > independent > > >> of > > >>>>>> the > > >>>>>>>>>>> fact > > >>>>>>>>>>>>>> that it is Drill on YARN or not, as maintaining two > separate > > >>>>>> ways of > > >>>>>>>>>>>>>> dealing with loading/unloading UDFs will be painful and > > error > > >>>>>> prone. > > >>>>>>>>>>>>>> One more note (I left a comment in the doc) - not sure > about > > >>>>>>>>>>>>>> authorization model here - we need to have some. > > >>>>>>>>>>>>>> Just my 2cThanks > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> From: Paul Rogers <prog...@maprtech.com> > > >>>>>>>>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org> > > >>>>>>>>>>>>>> Sent: Monday, June 20, 2016 7:32 PM > > >>>>>>>>>>>>>> Subject: Re: Dynamic UDFs support > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi Neeraja, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> The proposal calls for the user to copy the jar file to > each > > >>>>>>>> Drillbit > > >>>>>>>>>>>>>> node. The jar would go into a new > > >> $DRILL_HOME/jars/3rdparty/udf > > >>>>>>>>>>> directory. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying > > Drill > > >>>>>> code > > >>>>>>>> to > > >>>>>>>>>>>>>> each node (which is good.) YARN puts that code in a > location > > >>>>>> known > > >>>>>>>>>>> only to > > >>>>>>>>>>>>>> YARN. Since the location is private to YARN, the user > can’t > > >>>>>> easily > > >>>>>>>>>> hunt > > >>>>>>>>>>>>>> down the location in order to add the udf jar. Even if the > > >> user > > >>>>>> did > > >>>>>>>>>>> find > > >>>>>>>>>>>>>> the location, the next Drillbit to start would create a > new > > >> copy > > >>>>>> of > > >>>>>>>>>> the > > >>>>>>>>>>>>>> Drill software, without the udf jar. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Second, in DoY we have separated user files from Drill > > >> software. > > >>>>>>>> This > > >>>>>>>>>>>>>> makes it much easier to distribute the software to each > > node: > > >> we > > >>>>>>>> give > > >>>>>>>>>>> the > > >>>>>>>>>>>>>> Drill distribution tar archive to YARN, and YARN copies it > > to > > >>>>>> each > > >>>>>>>>>>> node and > > >>>>>>>>>>>>>> untars the Drill files. We make a separate copy of the > (far > > >>>>>> smaller) > > >>>>>>>>>>> set of > > >>>>>>>>>>>>>> user config files. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> If the udf jar goes into a Drill folder > > >>>>>>>>>>> ($DRILL_HOME/jars/3rdparty/udf), > > >>>>>>>>>>>>>> then the user would have to rebuild the Drill tar file > each > > >> time > > >>>>>>>> they > > >>>>>>>>>>> add a > > >>>>>>>>>>>>>> udf jar. When I tried this myself when building DoY, I > found > > >> it > > >>>>>> to > > >>>>>>>> be > > >>>>>>>>>>> slow > > >>>>>>>>>>>>>> and error-prone. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> So, the solution is to place the udf code in the new > “site” > > >>>>>>>>>> directory: > > >>>>>>>>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY > > >>>>>>>>>> automatically > > >>>>>>>>>>>>>> distribute the code to every node. Perfect! Except that it > > >> does > > >>>>>> not > > >>>>>>>>>>> work to > > >>>>>>>>>>>>>> dynamically distribute code after Drill starts. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> For DoY, the solution requirements are: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> 1. Distribute code using Drill itself, rather than > manually > > >>>>>> copying > > >>>>>>>>>>> jars > > >>>>>>>>>>>>>> to (unknown) Drill directories. > > >>>>>>>>>>>>>> 2. Ensure the solution works even if another Drillbit is > > spun > > >> up > > >>>>>>>>>> later, > > >>>>>>>>>>>>>> and uses the original Drill tar file. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I’m thinking we want to leverage DFS: place udf files > into a > > >>>>>>>>>> well-known > > >>>>>>>>>>>>>> DFS directory. Register the udf into, say, ZK. When a new > > >>>>>> Drillbit > > >>>>>>>>>>> starts, > > >>>>>>>>>>>>>> it looks for new udf jars in ZK, copies the file to a > > >> temporary > > >>>>>>>>>>> location, > > >>>>>>>>>>>>>> and launches. An existing Drill is notified of the change > > and > > >>>>>> does > > >>>>>>>>>> the > > >>>>>>>>>>> same > > >>>>>>>>>>>>>> download process. Clean-up is needed at some point to > remove > > >> ZK > > >>>>>>>>>>> entries if > > >>>>>>>>>>>>>> the udf jar becomes statically available on the next > launch. > > >> That > > >>>>>>>>>> needs > > >>>>>>>>>>>>>> more thought. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> We’d still need the phases mentioned earlier to ensure > > >>>>>> consistency. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Suggestions anyone as to how to do this super simply & > still > > >> get > > >>>>>> it > > >>>>>>>>>> to > > >>>>>>>>>>>>>> work with DoY? > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> - Paul > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala < > > >>>>>>>>>>>>>> nrentachint...@maprtech.com> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> This will need to work with YARN (Once Drill is YARN > > >> enabled, I > > >>>>>>>>>> would > > >>>>>>>>>>>>>>> expect a lot of users using it in conjunction with YARN). > > >>>>>>>>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. > Can > > >> you > > >>>>>>>>>>>>>> elaborate. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> -Neeraja > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers < > > >>>>>> prog...@maprtech.com > > >>>>>>>>> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Good enough, as long as we document the limitation that > > this > > >>>>>>>>>> feature > > >>>>>>>>>>>>>> can’t > > >>>>>>>>>>>>>>>> work with YARN deployment as users generally do not have > > >>>>>> access to > > >>>>>>>>>>> the > > >>>>>>>>>>>>>>>> temporary “localization” directories where the Drill > code > > is > > >>>>>>>> placed > > >>>>>>>>>>> by > > >>>>>>>>>>>>>> YARN. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Note that the jar distribution race condition issue > occurs > > >> with > > >>>>>>>> the > > >>>>>>>>>>>>>>>> proposed design: I believe I sketched out a scenario in > > one > > >> of > > >>>>>> the > > >>>>>>>>>>>>>> earlier > > >>>>>>>>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION > command. > > >> It > > >>>>>>>> tells > > >>>>>>>>>>>>>>>> Drillbit B. While informing the other Drillbits, > Drillbit > > B > > >>>>>> plans > > >>>>>>>>>> and > > >>>>>>>>>>>>>>>> launches a query that uses the function. Drillbit Z > starts > > >>>>>>>>>> execution > > >>>>>>>>>>>>>> of the > > >>>>>>>>>>>>>>>> query before it learns from A about the new function. > This > > >>>>>> will be > > >>>>>>>>>>>>>> rare — > > >>>>>>>>>>>>>>>> just rare enough to create very hard to reproduce bugs. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> The only reliable solution is to do the work in multiple > > >>>>>> passes: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Pass 1: Ask each node to load the function, but not make > > it > > >>>>>>>>>> available > > >>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>> the planner. (it would be available to the execution > > >> engine.) > > >>>>>>>>>>>>>>>> Pass 2: Await confirmation from each node that this is > > done. > > >>>>>>>>>>>>>>>> Pass 3: Alert every node that it is now free to plan > > queries > > >>>>>> with > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>> function. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Finally, I wonder if we should design the SQL syntax > based > > >> on a > > >>>>>>>>>>>>>> long-term > > >>>>>>>>>>>>>>>> design, even if the feature itself is a short-term > > >> work-around. > > >>>>>>>>>>>>>> Changing > > >>>>>>>>>>>>>>>> the syntax later might break scripts that users might > > write. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> So, the question for the group is this: is the value of > > >>>>>>>>>> semi-complete > > >>>>>>>>>>>>>>>> feature sufficient to justify the potential problems? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> - Paul > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra < > > >>>>>>>> pchan...@maprtech.com > > >>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Moving discussion to dev. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I believe the aim is to do a simple implementation > > without > > >> the > > >>>>>>>>>>>>>> complexity > > >>>>>>>>>>>>>>>>> of distributing the UDF. I think the document should > make > > >> this > > >>>>>>>>>>>>>> limitation > > >>>>>>>>>>>>>>>>> clear. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Per Paul's point on there being a simpler solution of > > just > > >>>>>> having > > >>>>>>>>>>> each > > >>>>>>>>>>>>>>>>> drillbit detect the if a UDF is present, I think the > > >> problem > > >>>>>> is > > >>>>>>>>>> if a > > >>>>>>>>>>>>>> UDF > > >>>>>>>>>>>>>>>>> get's deployed to some but not all drillbits. A query > can > > >> then > > >>>>>>>>>> start > > >>>>>>>>>>>>>>>>> executing but not run successfully. The intent of the > > >> create > > >>>>>>>>>>> commands > > >>>>>>>>>>>>>>>> would > > >>>>>>>>>>>>>>>>> be to ensure that all drillbits have the UDF or none > > would. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> I think Jacques' point about ownership conflicts is not > > >>>>>> addressed > > >>>>>>>>>>>>>>>> clearly. > > >>>>>>>>>>>>>>>>> Also, the unloading is not clear. The delete command > > should > > >>>>>>>>>> probably > > >>>>>>>>>>>>>>>> remove > > >>>>>>>>>>>>>>>>> the UDF and unload it. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers < > > >>>>>>>>>> prog...@maprtech.com > > >>>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary > > >>>>>> comments > > >>>>>>>>>> for > > >>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> community to consider. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN > project. > > Is > > >>>>>> this > > >>>>>>>> a > > >>>>>>>>>>>>>>>> specific > > >>>>>>>>>>>>>>>>>> fix for one unique problem, or is it worth expanding > the > > >>>>>>>> solution > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>>> work > > >>>>>>>>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make > > the > > >> two > > >>>>>>>>>> work > > >>>>>>>>>>>>>>>> together > > >>>>>>>>>>>>>>>>>> later. See comments in docs for details. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> 2. Have we, by chance, looked at how other projects > > handle > > >>>>>> code > > >>>>>>>>>>>>>>>>>> distribution? Spark, Storm and others automatically > > deploy > > >>>>>> code > > >>>>>>>>>>>>>> across > > >>>>>>>>>>>>>>>> the > > >>>>>>>>>>>>>>>>>> cluster; no manual distribution to each node. The key > > >>>>>> difference > > >>>>>>>>>>>>>> between > > >>>>>>>>>>>>>>>>>> Drill and others is that, for Storm, say, code is > > >> associated > > >>>>>>>>>> with a > > >>>>>>>>>>>>>> job > > >>>>>>>>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions > > are > > >>>>>> global > > >>>>>>>>>>> and > > >>>>>>>>>>>>>>>> have > > >>>>>>>>>>>>>>>>>> no obvious life cycle that suggests when the code can > be > > >>>>>>>>>> unloaded. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> 3. Have considered the class loader, dependency and > name > > >>>>>> space > > >>>>>>>>>>>>>> isolation > > >>>>>>>>>>>>>>>>>> issues addressed by such products as Tomcat (web apps) > > or > > >>>>>>>> Eclipse > > >>>>>>>>>>>>>>>>>> (plugins)? Putting user code in the same namespace as > > >> Drill > > >>>>>> code > > >>>>>>>>>>> is > > >>>>>>>>>>>>>>>> quick > > >>>>>>>>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to > > >>>>>> problems > > >>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>> require long, frustrating debugging sessions to > resolve. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing > > >> items > > >>>>>> 2 > > >>>>>>>>>> and > > >>>>>>>>>>> 3 > > >>>>>>>>>>>>>>>> are a > > >>>>>>>>>>>>>>>>>> big increase in scope, so I won’t be surprised if we > > leave > > >>>>>> those > > >>>>>>>>>>>>>> issues > > >>>>>>>>>>>>>>>> for > > >>>>>>>>>>>>>>>>>> later. (Though, addressing item 2 might be the best > way > > to > > >>>>>>>>>> address > > >>>>>>>>>>>>>> item > > >>>>>>>>>>>>>>>> 1.) > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> If we want a very simple solution that requires > minimal > > >>>>>> change, > > >>>>>>>>>>>>>> perhaps > > >>>>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>>>> can use an even simpler solution. In the proposed > > design, > > >> the > > >>>>>>>>>> user > > >>>>>>>>>>>>>> still > > >>>>>>>>>>>>>>>>>> must distribute code to all the nodes. The primary > > change > > >> is > > >>>>>> to > > >>>>>>>>>>> tell > > >>>>>>>>>>>>>>>> Drill > > >>>>>>>>>>>>>>>>>> to load (or unload) that code. Can accomplish the same > > >> result > > >>>>>>>>>>> easier > > >>>>>>>>>>>>>>>> simply > > >>>>>>>>>>>>>>>>>> by having Drill periodically scan certain directories > > >> looking > > >>>>>>>> for > > >>>>>>>>>>> new > > >>>>>>>>>>>>>>>> (or > > >>>>>>>>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve > the > > >> name > > >>>>>>>>>> space > > >>>>>>>>>>>>>>>> issues, > > >>>>>>>>>>>>>>>>>> but will work for existing non-YARN Drill users > without > > >> new > > >>>>>> SQL > > >>>>>>>>>>>>>> syntax. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> - Paul > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau < > > >>>>>>>> jacq...@dremio.com > > >>>>>>>>>>> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Two quick thoughts: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> - (user) In the design document I didn't see any > > >> discussion > > >>>>>> of > > >>>>>>>>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to > > see > > >>>>>> the > > >>>>>>>>>>>>>> thinking > > >>>>>>>>>>>>>>>>>> there > > >>>>>>>>>>>>>>>>>>> - (dev) There is a row oriented facade via the > > >>>>>>>>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That > > would > > >>>>>> be a > > >>>>>>>>>>> good > > >>>>>>>>>>>>>>>> place > > >>>>>>>>>>>>>>>>>>> to start when trying to implement an alternative > > >> interface. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>>>>>>> Jacques Nadeau > > >>>>>>>>>>>>>>>>>>> CTO and Co-Founder, Dremio > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik < > > >>>>>>>>>> j...@omernik.com> > > >>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I > think > > >> some > > >>>>>> of > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>> ideas > > >>>>>>>>>>>>>>>>>>>> around community java UDFs could be a better > approach. > > >> I'd > > >>>>>>>> hate > > >>>>>>>>>>> to > > >>>>>>>>>>>>>>>> take > > >>>>>>>>>>>>>>>>>>>> away from other work to hack in something like this. > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers < > > >>>>>>>>>>> prog...@maprtech.com > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> Ted refers to source code transformation. Drill > gains > > >> its > > >>>>>>>>>> speed > > >>>>>>>>>>>>>> from > > >>>>>>>>>>>>>>>>>>>> value > > >>>>>>>>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the > > row-based > > >>>>>>>>>> interface > > >>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>>>>>> most > > >>>>>>>>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are > > >> very > > >>>>>> type > > >>>>>>>>>>>>>>>> specific, > > >>>>>>>>>>>>>>>>>>>>> code is typically generated to handle the specifics > > of > > >>>>>> each > > >>>>>>>>>>> type. > > >>>>>>>>>>>>>>>>>>>> Accessing > > >>>>>>>>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because > of > > >> the > > >>>>>>>>>>>>>> "impedence > > >>>>>>>>>>>>>>>>>>>>> mismatch" between how VVs work and the > row-and-column > > >> view > > >>>>>>>>>>>>>> expected > > >>>>>>>>>>>>>>>> by > > >>>>>>>>>>>>>>>>>>>> most > > >>>>>>>>>>>>>>>>>>>>> (non-Drill) developers. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> I wonder if we've considered providing a > row-oriented > > >>>>>>>> "facade" > > >>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>>> can > > >>>>>>>>>>>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>> used by roll-your own data sources and user-defined > > row > > >>>>>>>>>>>>>> transforms? > > >>>>>>>>>>>>>>>>>> Might > > >>>>>>>>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be > > handy > > >>>>>> for > > >>>>>>>>>>> users > > >>>>>>>>>>>>>>>>>> willing > > >>>>>>>>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such > a > > >>>>>> facade, > > >>>>>>>>>> the > > >>>>>>>>>>>>>>>> Jython > > >>>>>>>>>>>>>>>>>>>> row > > >>>>>>>>>>>>>>>>>>>>> transforms that John mentions could be quite > simple. > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning < > > >>>>>>>>>>>>>> ted.dunn...@gmail.com > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> Since UDF's use source code transformation, using > > >> Jython > > >>>>>>>>>> would > > >>>>>>>>>>> be > > >>>>>>>>>>>>>>>>>>>>>> difficult. > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva > < > > >>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote: > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> Hi Charles, > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't > > >> invent > > >>>>>>>>>>>>>> anything > > >>>>>>>>>>>>>>>>>>>> new, > > >>>>>>>>>>>>>>>>>>>>>> just > > >>>>>>>>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit > > >> restart. > > >>>>>> But > > >>>>>>>>>>>>>>>>>>>>> contributions > > >>>>>>>>>>>>>>>>>>>>>>> are welcomed. > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre < > > >>>>>>>>>>> cgi...@gmail.com > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> Arina, > > >>>>>>>>>>>>>>>>>>>>>>>> Has there been any discussion about making it > > >> possible > > >>>>>> via > > >>>>>>>>>>>>>> Jython > > >>>>>>>>>>>>>>>>>>>> or > > >>>>>>>>>>>>>>>>>>>>>>>> something for users to write simple UDFs in > > Python? > > >>>>>>>>>>>>>>>>>>>>>>>> My ideal would be to have this capability > > >> integrated in > > >>>>>>>> the > > >>>>>>>>>>> web > > >>>>>>>>>>>>>>>> GUI > > >>>>>>>>>>>>>>>>>>>>>> such > > >>>>>>>>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python) > > right > > >>>>>> there, > > >>>>>>>>>>>>>> submit > > >>>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>>>>>> it > > >>>>>>>>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes > validation > > >>>>>> tests. > > >>>>>>>>>>>>>>>>>>>>>>>> —C > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva < > > >>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> > > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all! > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs > support > > >> in > > >>>>>>>>>> Drill ( > > >>>>>>>>>>>>>>>>>>>>>>>>> > https://issues.apache.org/jira/browse/DRILL-4726 > > ). > > >>>>>> There > > >>>>>>>>>>> is a > > >>>>>>>>>>>>>>>>>>>> link > > >>>>>>>>>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>>>>>>>>>> design document in Jira description. > > >>>>>>>>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed. > > >>>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards > > >>>>>>>>>>>>>>>>>>>>>>>>> Arina > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>> > > >> > > >> > > > > > > >