Recognize the difficulty. Not suggesting this be addressed in first version. Just suggesting some thought about how a real user will workaround. Maybe some doc and/or small changes can make this easier.
Keys _______________________________ Keys Botzum Senior Principal Technologist kbot...@maprtech.com 443-718-0098 MapR Technologies http://www.mapr.com On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> wrote: > Hi All, > > Adding a dynamic DROP would, of course, be a great addition! The reason > for suggesting we skip that was to control project scope. > > Dynamic DROP requires a synchronization step. Here’s the scenario: > > * Foreman A starts a query using UDF U. > * Foreman B receives a request to drop UDF U, followed by a request to add > a new version of U, U’. > > How do we drop a function that may be in use? There are some tricky bits > to work out, which seemed too overwhelming to consider all in one go. > > Clearly just dropping U and adding a new version of U with the same name > leads to issues if not synchronized. If a Drillbit D is running a query > with U when it receives notice to drop U, should D complete the query or > fail it? If the query completes, then how does D deal with the request to > register U’, which has the same name? > > Do we globally synchronize function deletion? (The foreman B that receives > the drop request waits for all queries using U to finish.) But, how do we > know which queries use U? > > An eventually consistent approach is to track the age of the oldest > running query. Suppose B drops U at time T. Any query received after T that > uses U will fail in planning. A new U’ can’t be registered until all > queries that started before T complete. > > The primary challenge we face in both the CREATE and DROP cases is that > Drill is distributed with little central coordination. That’s great for > scale, but makes it hard to design features that require coordination. Some > other tools solve this problem with a data dictionary (or “metastore"). > Alas, Drill does not have such a concept. So a seemingly simple feature > like dynamic UDF becomes a major design challenge to get right. > > Thanks, > > - Paul > > > On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala < > nrentachint...@maprtech.com> wrote: > > > > The whole point of this feature is to avoid Drill cluster restarts as the > > name indicates 'Dynamic' UDFs. > > So any design that requires restarts I would think would beat the > purpose. > > > > I also think this is an example of a feature we start with a simple > design > > to serve the purpose, take feedback on how it is being deployed/used in > > real user situations and improve it in subsequent releases. > > > > -thanks > > Neeraja > > > > On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com> > wrote: > > > >> I think there are a lot of great ideas here. My one concern is the lack > of > >> unload and thus presumably replace functionality. I'm just thinking > about > >> typical actual usage. > >> > >> In a typical development cycle someone writes something, tries it, > learns, > >> changes it, and tries again. Assuming I understand the design that > change > >> step requires a full Drill cluster restart. That is going to be very > >> disruptive and will make UDF work nearly impossible without a dedicated > >> "private" cluster for Drill. I realize that people should have access to > >> the data they need and Drill in a development cluster but even then > >> restarts can be hard since development clusters are often shared - and > >> that's assuming such a cluster exists. I realize of course Drill can be > run > >> as a standalone Drillbit but I'm not convinced that desktops will have > >> adequate access to the needed data. > >> > >> Having dealt with Java classloading over the years, I'm not claiming > class > >> replacement is an easy thing so I'll defer to others on the priority of > >> that, but I'm wondering if there isn't some way to make UDF > experimentation > >> a bit easier/practical. > >> > >> Given the above, let me toss out some possibly naive ideas that maybe > are > >> workable: > >> * can I easily run a standalone Drillbit on a Hadoop cluster node that > is > >> already running Drill servers? I'm sure this can be done, but is it > easy? > >> Could we perhaps make this clearer as an explicit kind of thing? > >> * is there a way that when I deploy a UDF I can constrain the # of bits > it > >> is loaded into and perhaps even specify the bits? > >> * Obvious correlarary is I'd want my query to run on those bits and a > >> not too disruptive way to restart just those bits > >> > >> The above may be obvious to Drill experts. If it is then perhaps the UDF > >> docs could just point out how to easily develop UDFs in an iterative > >> fashion. > >> > >> Keys > >> _______________________________ > >> Keys Botzum > >> Senior Principal Technologist > >> kbot...@maprtech.com <mailto:kbot...@maprtech.com> > >> 443-718-0098 > >> MapR Technologies > >> http://www.mapr.com <http://www.mapr.com/> > >>> On Jul 21, 2016, at 3:13 AM, Paul Rogers <prog...@maprtech.com> wrote: > >>> > >>> Always good to have options… Another is to try an eventual consistency > >> model. > >>> > >>> The invariant here is the one that was mentioned earlier. Whenever a > >> query is submitted with UDF U, that query either fails in planning > (because > >> U is unknown) or succeeds on all nodes (at least with respect to U.) > >>> > >>> For this to work, we need a constant view of the world. We can try to > >> enforce consistency at function registration time (the original > design), or > >> via the Foreman (Parth’s design.) We can probably also use an eventual > >> consistency model. > >>> > >>> Suppose we have a global name space of functions. With the global name > >> space, we can establish this invariant: If a function is in that name > >> space, then the Foreman accepts the query. If a Drillbit receives a > >> fragment, but does not yet know of U, then the Drillbit A) knows that > some > >> foreman must have registered U (or the query would have failed in > planning) > >> and B) the Drillbit can download the function if not already in place. > >>> > >>> Folks pointed out that always checking a global name space is > expensive, > >> which it is. As it turns out, we can first check the local function > >> registry. If the Drillbit already knows about the function, we’re done > >> checking, no global check needed. It is only on the first use of a new > >> function, when it is not yet loaded locally, that the global check must > be > >> done. > >>> > >>> For this to work the foreman that registers UDF U must: > >>> > >>> 1. From Arina’s proposed staging area, check the jar contents to see if > >> a name conflict exists with the global registry. (Requires some class > >> loader code.) > >>> 2. If a conflict exists, refuse to register the function and return an > >> error. > >>> 3. If no conflict exists, register the function in the global name > space > >> and move the jar to the registered area in DFS. > >>> > >>> In this model, it is entirely optional whether the foreman that > >> registers U alerts other Drillbits. Instead, Drillbits could poll from > time > >> to time, or just wait until they see a query with U and do the download > at > >> that time. > >>> > >>> When a new Drillbit starts, it can load all functions in the registry > >> area because these have all passed the name collision test and can all > be > >> used in queries. Any new registrations will be found and loaded as > above. > >> (It is not required to preload functions, but it might help > performance.) > >>> > >>> ZK is the only place we have at present for the global name space, so > >> that seems the logical tool. ZK allows atomic operations, which we need > >> here. Operations 1, 2, and 3 above should be atomic. > >>> > >>> Unfortunately, we can’t do the DFS move atomically with a ZK name space > >> insertion. So, the global name check & insert should be atomic. If that > >> succeeds, copy the jar into the registered folder. There are a few > details > >> to work out to handle special cases, but we can cover those another > time. > >> (Hint: what happens if the Foreman crashes after insetting the ZK entry > but > >> before moving the jar?) > >>> > >>> None of the proposed designs permit graceful unloading of functions. > So, > >> deleting functions will require a cluster restart to establish a new > stable > >> checkpoint. > >>> > >>> We can recommend that on each cluster restart, any functions in the DFS > >> registry be copied to each Drillbit (much easier with the coming YARN > >> integration) as a way of keeping the DFS registry a reasonable size. > >>> > >>> More details to work out, but that’s the gist of the concept. > >>> > >>> Thanks, > >>> > >>> - Paul > >>> > >>>> On Jul 20, 2016, at 2:37 PM, Parth Chandra <pchan...@maprtech.com> > >> wrote: > >>>> > >>>> My notes from the hangout with Arina and Paul - > >>>> > >>>> Notes - > >>>> > >>>> There are two invariants for the registration process - > >>>> 1) There is a registration/validated directory in the DFS that > contains > >>>> UDFS that have been validated by the registering foreman. All > drillbits > >>>> will have access to this directory and on startup and/or UDF > >> registration, > >>>> the jars in this directory are sync'd up with a local UDF directory > >>>> 2) During the process of registration, the registering foreman > creates a > >>>> Zookeeper node that indicates that one or more drillbits has not yet > >>>> registered the UDF. > >>>> > >>>> The basic workflow is that UDF jars are copied from the staging > >> directory > >>>> to the registration directory and validated. Once they are validated, > >> the > >>>> available drillbits are told to register the UDF. Registering the UDF > >>>> consists of copying the node to a local UDF directory and updating the > >>>> local (in-memory) udf registry. A sentinel node in zookeeper is used > to > >>>> track when all the drillbits have registered the UDF. > >>>> > >>>> There were two main suggestions : Immediate registration and lazy > >>>> registration, > >>>> > >>>> Immediate registration - > >>>> Foreman tells all drillbits to register. Creates a Zookeeper node to > >>>> track. > >>>> Every drillbit makes a local copy and updates zookeeper node to show > it > >>>> is done. > >>>> Foreman checks the zookeeper node and when all available drillbits > have > >>>> acknowledged, sends a message to all drillbits to complete > registration. > >>>> Foreman removes ZK node. > >>>> All Drillbits update their local UDF registry > >>>> Drillbit startup will block if there is a ZK node indicating > >>>> registration is in progress. > >>>> This approach needs to be validated to see if any race conditions > >> exist. > >>>> > >>>> Lazy registration > >>>> Once a UDF is copied to the registration folder, the UDF is > essentially > >>>> registered. On first use, a drillbit may hit a classnotfound exception > >> in > >>>> which case it will look for the UDF in the registration directory. If > >>>> found, it will copy to the local directory and add the UDF to it's > local > >>>> registry. > >>>> This approach should be investigated to see if it fits in with the > >>>> current UDF execution code. > >>>> > >>>> > >>>> On Mon, Jul 18, 2016 at 3:36 PM, Parth Chandra <pchan...@maprtech.com > > > >>>> wrote: > >>>> > >>>>> +1 on simplifying the design and postpone the items Paul has > suggested. > >>>>> > >>>>> Arina, Paul, I think we need to work out some of the design related > to > >>>>> registering the UDF. Are you guys open for a quick hangout @10 a.m > PDT > >>>>> tomorrow? > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com> > >> wrote: > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> We’ve had quite a lively debate in the “comments” section of Arina’s > >>>>>> wonderful design doc. Zelaine made a great suggestion: summarize the > >> user > >>>>>> experience as a way of making sense of the wealth of detailed > >> comments. > >>>>>> > >>>>>> IMHO, the most important user experience goals are: > >>>>>> > >>>>>> 1. When a user submits a CREATE FUNCTION command, the command > returns > >>>>>> quickly (within a few seconds at most.) > >>>>>> 2. If the above user then issues a query using that function (to the > >> same > >>>>>> Foreman), that query is guaranteed to successfully use the new > >> function on > >>>>>> all nodes. > >>>>>> 3. Other users, connecting to any Foreman will see a very clean > >> behavior > >>>>>> when submitting a query with the new function. Before some point in > >> time > >>>>>> (can be different for each Foreman), a query with the function fails > >> in > >>>>>> planning. After that point, queries are guaranteed to successfully > >> use the > >>>>>> new function on all nodes. > >>>>>> > >>>>>> Basically, this says that CREATE FUNCTION can’t (potentially) take a > >> long > >>>>>> time. Use of functions can’t result in random failures during the > >> time that > >>>>>> the function is propagated across Drillbits. > >>>>>> > >>>>>> The goals we can perhaps postpone are: > >>>>>> > >>>>>> 1. Class name space isolation. (Allows two data scientists to define > >> the > >>>>>> same class without collisions.) > >>>>>> 2. Function name spaces. (Allows me to define “paul.foo” and you to > >>>>>> define “bob.foo” with out collisions. (Needed if many people develop > >>>>>> functions independently. Else, we need a global name space.) > >>>>>> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and > it > >>>>>> requires unloading classes and name space cleanup.) (Just let the > >> cleanup > >>>>>> happen offline.) > >>>>>> 4. Dependency jars (e.g. third party libraries, etc.) (We require > >> those > >>>>>> to be statically added to the class path before Drill starts.) > >>>>>> > >>>>>> We are not creating per-user name spaces, or allowing people to use > >>>>>> production clusters to try/revise functions. We’re just sampling > >> deployment > >>>>>> of simple functions. > >>>>>> > >>>>>> That’s my suggestion, what do others suggest? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> - Paul > >>>>>> > >>>>>>> On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva < > >>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>>> > >>>>>>> I also agree on using Zookeeper. I have re-worked dynamic UDF > support > >>>>>>> document taking into account Zookeeper usage. > >>>>>>> > >>>>>>> Link to the document - > >>>>>>> > >>>>>> > >> > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > >>>>>>> > >>>>>>> Kind regards > >>>>>>> Arina > >>>>>>> > >>>>>>> On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <prog...@maprtech.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>>> Great idea! We already use ZK to track storage plugins. ZK is > >> perhaps > >>>>>>>> better suited to register each jar and/or function that using > files > >> in > >>>>>> DFS. > >>>>>>>> Still need to work out the proper sequencing. But you are right, > >> this > >>>>>> is > >>>>>>>> the kind of thing that ZK is supposed to solve. > >>>>>>>> > >>>>>>>> - Paul > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org> > >> wrote: > >>>>>>>>> > >>>>>>>>> Reading thru some of Paul's comments on maintaining a consistent > >> state > >>>>>>>> for > >>>>>>>>> the registration of the UDF, it looks like we need a consensus > >>>>>> protocol > >>>>>>>> for > >>>>>>>>> determining that all the Drillbits have the UDF deployed. > >>>>>>>>> I believe Zookeeper can provide a stronger guarantee than a 2 > phase > >>>>>>>>> approach. Should we look into that? > >>>>>>>>> > >>>>>>>>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva < > >>>>>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>>> Hi all! > >>>>>>>>>> > >>>>>>>>>> I have updated design document. > >>>>>>>>>> Main changes: > >>>>>>>>>> 1. Add to Drill’s config цшер the staging and registration DFS > >>>>>>>> locations. > >>>>>>>>>> 2. User is no longer is responsible for copying jars into > drillbit > >>>>>>>> nodes. > >>>>>>>>>> Now user needs to copy jars into staging DFS location from where > >>>>>>>> drillbits > >>>>>>>>>> will copy them to local fs. > >>>>>>>>>> 2. During UDFs registration jars will be moved to DFS > registration > >>>>>> area. > >>>>>>>>>> 3. During start up drillbit will copy all jars from registration > >>>>>> area, > >>>>>>>> so > >>>>>>>>>> newly added drillbit will have all UDFs as others. > >>>>>>>>>> 4. Security issues - probably they will be added later as > >>>>>> enhancement. > >>>>>>>>>> > >>>>>>>>>> More detains in the document: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > >> > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > >>>>>>>>>> > >>>>>>>>>> Kind regards > >>>>>>>>>> Arina > >>>>>>>>>> > >>>>>>>>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers < > prog...@maprtech.com > >>> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi All, > >>>>>>>>>>> > >>>>>>>>>>> To answer Arina on item 3: there is actually no good location > on > >> any > >>>>>>>>>> local > >>>>>>>>>>> node to put the UDFs. Reason: DoY allows the admin to start a > >>>>>> Drillbit > >>>>>>>> on > >>>>>>>>>>> any available node. When it starts, a new, fresh copy of Drill > >> will > >>>>>> be > >>>>>>>>>>> downloaded, and this can happen after the user issued the > CREATE > >>>>>>>> command. > >>>>>>>>>>> > >>>>>>>>>>> What we need is a shared, secure distributed storage location > >> from > >>>>>>>> which > >>>>>>>>>>> Drillbits can download the needed jar files. Something like… > DFS! > >>>>>>>> Indeed, > >>>>>>>>>>> this is how YARN stores the Drill archive from which it creates > >> the > >>>>>>>> Drill > >>>>>>>>>>> install directory on each node. We can’t quite use YARN’s > >> mechanism > >>>>>>>> (YARN > >>>>>>>>>>> is aware only of the files uploaded when launching an app), but > >> we > >>>>>> can > >>>>>>>> do > >>>>>>>>>>> something similar. > >>>>>>>>>>> > >>>>>>>>>>> So, brainstorming a bit… > >>>>>>>>>>> > >>>>>>>>>>> 1. Store the UDF jar in a pre-defined DFS location. > >>>>>>>>>>> > >>>>>>>>>>> 2. The CREATE function 1) uploads the jar to the DFS location, > >> and > >>>>>> 2) > >>>>>>>>>>> creates some kind of registry entry. > >>>>>>>>>>> > >>>>>>>>>>> 3. The DELETE function 1) deregisters the jar (and function), > >> but 2) > >>>>>>>> does > >>>>>>>>>>> not delete the jar (this allows in-flight queries to complete.) > >>>>>>>>>>> > >>>>>>>>>>> 3. Drillbits periodically check DFS for changed registrations, > >>>>>>>>>> downloading > >>>>>>>>>>> any needed jars. (YARN, Spark, Storm and others already do > >> something > >>>>>>>>>>> similar.) > >>>>>>>>>>> > >>>>>>>>>>> 4. Registry check is “forced” when processing a query with a > >>>>>> function > >>>>>>>>>> that > >>>>>>>>>>> is not currently registered. (Doing so resolves any possible > race > >>>>>>>>>>> conditions.) > >>>>>>>>>>> > >>>>>>>>>>> 5. Some process (perhaps time based) removes old, unregistered > >> jar > >>>>>>>> files. > >>>>>>>>>>> (Or, we could get fancy and use reference counts. The reference > >>>>>> count > >>>>>>>>>> would > >>>>>>>>>>> be required if the user wants to delete, then recreate, the > same > >>>>>>>> function > >>>>>>>>>>> and jar to avoid conflict with in-flight queries.) > >>>>>>>>>>> > >>>>>>>>>>> We can build security on this as follows: > >>>>>>>>>>> > >>>>>>>>>>> 1. Define permissions for who can write to the DFS location. > Or, > >>>>>>>> indeed, > >>>>>>>>>>> have subdirectories by user and grant each user permission only > >> on > >>>>>>>> their > >>>>>>>>>>> own UDF directory. > >>>>>>>>>>> > >>>>>>>>>>> 2. Provide separate registries for per-user functions (private) > >> and > >>>>>>>>>> global > >>>>>>>>>>> functions (public). Only the admin can add global functions. > But, > >>>>>> only > >>>>>>>>>> the > >>>>>>>>>>> user that uploads a private function can use it. > >>>>>>>>>>> > >>>>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their own > >> name > >>>>>>>> space > >>>>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call > >> into a > >>>>>>>> UDF, > >>>>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill > >>>>>> classes > >>>>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t > clash > >>>>>> with > >>>>>>>>>> your > >>>>>>>>>>> function Foo if both are private. > >>>>>>>>>>> > >>>>>>>>>>> Sorry that this has wandered a bit far from the original simple > >>>>>> design, > >>>>>>>>>>> but the above may capture much of what folks expect in modern > >>>>>>>> distributed > >>>>>>>>>>> big data systems. > >>>>>>>>>>> > >>>>>>>>>>> I wonder if a good next step might be to review the notes in > the > >>>>>> design > >>>>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a > >> summary > >>>>>> of > >>>>>>>>>>> technical requirements, and a proposed design. Postpone, at > least > >>>>>> for > >>>>>>>>>> now, > >>>>>>>>>>> concerns about the amount of work; we can worry about that once > >>>>>> folks > >>>>>>>>>> agree > >>>>>>>>>>> on your revised design. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> > >>>>>>>>>>> - Paul > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva < > >>>>>>>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> 4. Authorization model mentioned by Julia and John > >>>>>>>>>>>> If user won't have rights to copy jars to UDF classpath, which > >> can > >>>>>> be > >>>>>>>>>>>> restricted by file system, he won't be able to do much harm by > >>>>>> running > >>>>>>>>>>>> CREATE command. If UDFs from jar were already registered, > CREATE > >>>>>>>>>>> statement > >>>>>>>>>>>> will fail. CREATE OR REPLACE will just re-register UDFs. > >>>>>>>>>>>> But DELETE command is not safe. If user knows jar name, he can > >>>>>> delete > >>>>>>>>>> all > >>>>>>>>>>>> associated with it UDFs, as well as the binary and source > jars. > >>>>>> That's > >>>>>>>>>>>> where we'll probably need to impose restrictions. > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva < > >>>>>>>>>>> arina.yelchiy...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> 1. DELETE command - I missed to indicate it document but had > it > >>>>>> in my > >>>>>>>>>>>>> mind. When user issues DELETE command, all UDF associated > with > >>>>>>>>>> indicated > >>>>>>>>>>>>> jar is removed from DrillFunctionRegistry. And then binary > and > >>>>>> source > >>>>>>>>>>>>> files are also deleted from UDF classpath. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. Distribution race condition described by Paul > >>>>>>>>>>>>> User issues CREATE command and gets confirmation that UDFs is > >>>>>>>>>> registered > >>>>>>>>>>>>> only if all drilllbits have confirmed that registration was > >>>>>>>>>> successful. > >>>>>>>>>>>>> I don't expect user to start using UDFs in queries prior to > >> CREATE > >>>>>>>>>>> command > >>>>>>>>>>>>> success / failure result, which is possible but strange. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 3. DoY > >>>>>>>>>>>>> @Paul > >>>>>>>>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly we > >> use > >>>>>>>>>>>>> $DRILL_UDF environment variable which will be set during > >> drillbit > >>>>>>>>>> start > >>>>>>>>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable will > be > >>>>>> added > >>>>>>>>>> to > >>>>>>>>>>>>> Drill classpath during start. > >>>>>>>>>>>>> Will it ease DoY integration somehow? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Kind regards > >>>>>>>>>>>>> Arina > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman > >>>>>>>>>>> <yufeld...@yahoo.com.invalid> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Just thoughts: > >>>>>>>>>>>>>> You can try to reuse distributed cache Let Drill AM do the > >>>>>> needful > >>>>>>>> in > >>>>>>>>>>>>>> terms of orchestrating UDF jars distribution. > >>>>>>>>>>>>>> But > >>>>>>>>>>>>>> I would be inclined to have a common path that is > independent > >> of > >>>>>> the > >>>>>>>>>>> fact > >>>>>>>>>>>>>> that it is Drill on YARN or not, as maintaining two separate > >>>>>> ways of > >>>>>>>>>>>>>> dealing with loading/unloading UDFs will be painful and > error > >>>>>> prone. > >>>>>>>>>>>>>> One more note (I left a comment in the doc) - not sure about > >>>>>>>>>>>>>> authorization model here - we need to have some. > >>>>>>>>>>>>>> Just my 2cThanks > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> From: Paul Rogers <prog...@maprtech.com> > >>>>>>>>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org> > >>>>>>>>>>>>>> Sent: Monday, June 20, 2016 7:32 PM > >>>>>>>>>>>>>> Subject: Re: Dynamic UDFs support > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Neeraja, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The proposal calls for the user to copy the jar file to each > >>>>>>>> Drillbit > >>>>>>>>>>>>>> node. The jar would go into a new > >> $DRILL_HOME/jars/3rdparty/udf > >>>>>>>>>>> directory. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying > Drill > >>>>>> code > >>>>>>>> to > >>>>>>>>>>>>>> each node (which is good.) YARN puts that code in a location > >>>>>> known > >>>>>>>>>>> only to > >>>>>>>>>>>>>> YARN. Since the location is private to YARN, the user can’t > >>>>>> easily > >>>>>>>>>> hunt > >>>>>>>>>>>>>> down the location in order to add the udf jar. Even if the > >> user > >>>>>> did > >>>>>>>>>>> find > >>>>>>>>>>>>>> the location, the next Drillbit to start would create a new > >> copy > >>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>> Drill software, without the udf jar. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Second, in DoY we have separated user files from Drill > >> software. > >>>>>>>> This > >>>>>>>>>>>>>> makes it much easier to distribute the software to each > node: > >> we > >>>>>>>> give > >>>>>>>>>>> the > >>>>>>>>>>>>>> Drill distribution tar archive to YARN, and YARN copies it > to > >>>>>> each > >>>>>>>>>>> node and > >>>>>>>>>>>>>> untars the Drill files. We make a separate copy of the (far > >>>>>> smaller) > >>>>>>>>>>> set of > >>>>>>>>>>>>>> user config files. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If the udf jar goes into a Drill folder > >>>>>>>>>>> ($DRILL_HOME/jars/3rdparty/udf), > >>>>>>>>>>>>>> then the user would have to rebuild the Drill tar file each > >> time > >>>>>>>> they > >>>>>>>>>>> add a > >>>>>>>>>>>>>> udf jar. When I tried this myself when building DoY, I found > >> it > >>>>>> to > >>>>>>>> be > >>>>>>>>>>> slow > >>>>>>>>>>>>>> and error-prone. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So, the solution is to place the udf code in the new “site” > >>>>>>>>>> directory: > >>>>>>>>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY > >>>>>>>>>> automatically > >>>>>>>>>>>>>> distribute the code to every node. Perfect! Except that it > >> does > >>>>>> not > >>>>>>>>>>> work to > >>>>>>>>>>>>>> dynamically distribute code after Drill starts. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For DoY, the solution requirements are: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. Distribute code using Drill itself, rather than manually > >>>>>> copying > >>>>>>>>>>> jars > >>>>>>>>>>>>>> to (unknown) Drill directories. > >>>>>>>>>>>>>> 2. Ensure the solution works even if another Drillbit is > spun > >> up > >>>>>>>>>> later, > >>>>>>>>>>>>>> and uses the original Drill tar file. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I’m thinking we want to leverage DFS: place udf files into a > >>>>>>>>>> well-known > >>>>>>>>>>>>>> DFS directory. Register the udf into, say, ZK. When a new > >>>>>> Drillbit > >>>>>>>>>>> starts, > >>>>>>>>>>>>>> it looks for new udf jars in ZK, copies the file to a > >> temporary > >>>>>>>>>>> location, > >>>>>>>>>>>>>> and launches. An existing Drill is notified of the change > and > >>>>>> does > >>>>>>>>>> the > >>>>>>>>>>> same > >>>>>>>>>>>>>> download process. Clean-up is needed at some point to remove > >> ZK > >>>>>>>>>>> entries if > >>>>>>>>>>>>>> the udf jar becomes statically available on the next launch. > >> That > >>>>>>>>>> needs > >>>>>>>>>>>>>> more thought. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We’d still need the phases mentioned earlier to ensure > >>>>>> consistency. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Suggestions anyone as to how to do this super simply & still > >> get > >>>>>> it > >>>>>>>>>> to > >>>>>>>>>>>>>> work with DoY? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - Paul > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala < > >>>>>>>>>>>>>> nrentachint...@maprtech.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This will need to work with YARN (Once Drill is YARN > >> enabled, I > >>>>>>>>>> would > >>>>>>>>>>>>>>> expect a lot of users using it in conjunction with YARN). > >>>>>>>>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. Can > >> you > >>>>>>>>>>>>>> elaborate. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -Neeraja > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers < > >>>>>> prog...@maprtech.com > >>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Good enough, as long as we document the limitation that > this > >>>>>>>>>> feature > >>>>>>>>>>>>>> can’t > >>>>>>>>>>>>>>>> work with YARN deployment as users generally do not have > >>>>>> access to > >>>>>>>>>>> the > >>>>>>>>>>>>>>>> temporary “localization” directories where the Drill code > is > >>>>>>>> placed > >>>>>>>>>>> by > >>>>>>>>>>>>>> YARN. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Note that the jar distribution race condition issue occurs > >> with > >>>>>>>> the > >>>>>>>>>>>>>>>> proposed design: I believe I sketched out a scenario in > one > >> of > >>>>>> the > >>>>>>>>>>>>>> earlier > >>>>>>>>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION command. > >> It > >>>>>>>> tells > >>>>>>>>>>>>>>>> Drillbit B. While informing the other Drillbits, Drillbit > B > >>>>>> plans > >>>>>>>>>> and > >>>>>>>>>>>>>>>> launches a query that uses the function. Drillbit Z starts > >>>>>>>>>> execution > >>>>>>>>>>>>>> of the > >>>>>>>>>>>>>>>> query before it learns from A about the new function. This > >>>>>> will be > >>>>>>>>>>>>>> rare — > >>>>>>>>>>>>>>>> just rare enough to create very hard to reproduce bugs. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The only reliable solution is to do the work in multiple > >>>>>> passes: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Pass 1: Ask each node to load the function, but not make > it > >>>>>>>>>> available > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>> the planner. (it would be available to the execution > >> engine.) > >>>>>>>>>>>>>>>> Pass 2: Await confirmation from each node that this is > done. > >>>>>>>>>>>>>>>> Pass 3: Alert every node that it is now free to plan > queries > >>>>>> with > >>>>>>>>>> the > >>>>>>>>>>>>>>>> function. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Finally, I wonder if we should design the SQL syntax based > >> on a > >>>>>>>>>>>>>> long-term > >>>>>>>>>>>>>>>> design, even if the feature itself is a short-term > >> work-around. > >>>>>>>>>>>>>> Changing > >>>>>>>>>>>>>>>> the syntax later might break scripts that users might > write. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So, the question for the group is this: is the value of > >>>>>>>>>> semi-complete > >>>>>>>>>>>>>>>> feature sufficient to justify the potential problems? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> - Paul > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra < > >>>>>>>> pchan...@maprtech.com > >>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Moving discussion to dev. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I believe the aim is to do a simple implementation > without > >> the > >>>>>>>>>>>>>> complexity > >>>>>>>>>>>>>>>>> of distributing the UDF. I think the document should make > >> this > >>>>>>>>>>>>>> limitation > >>>>>>>>>>>>>>>>> clear. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Per Paul's point on there being a simpler solution of > just > >>>>>> having > >>>>>>>>>>> each > >>>>>>>>>>>>>>>>> drillbit detect the if a UDF is present, I think the > >> problem > >>>>>> is > >>>>>>>>>> if a > >>>>>>>>>>>>>> UDF > >>>>>>>>>>>>>>>>> get's deployed to some but not all drillbits. A query can > >> then > >>>>>>>>>> start > >>>>>>>>>>>>>>>>> executing but not run successfully. The intent of the > >> create > >>>>>>>>>>> commands > >>>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>> be to ensure that all drillbits have the UDF or none > would. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I think Jacques' point about ownership conflicts is not > >>>>>> addressed > >>>>>>>>>>>>>>>> clearly. > >>>>>>>>>>>>>>>>> Also, the unloading is not clear. The delete command > should > >>>>>>>>>> probably > >>>>>>>>>>>>>>>> remove > >>>>>>>>>>>>>>>>> the UDF and unload it. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers < > >>>>>>>>>> prog...@maprtech.com > >>>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary > >>>>>> comments > >>>>>>>>>> for > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> community to consider. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN project. > Is > >>>>>> this > >>>>>>>> a > >>>>>>>>>>>>>>>> specific > >>>>>>>>>>>>>>>>>> fix for one unique problem, or is it worth expanding the > >>>>>>>> solution > >>>>>>>>>>> to > >>>>>>>>>>>>>>>> work > >>>>>>>>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make > the > >> two > >>>>>>>>>> work > >>>>>>>>>>>>>>>> together > >>>>>>>>>>>>>>>>>> later. See comments in docs for details. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 2. Have we, by chance, looked at how other projects > handle > >>>>>> code > >>>>>>>>>>>>>>>>>> distribution? Spark, Storm and others automatically > deploy > >>>>>> code > >>>>>>>>>>>>>> across > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> cluster; no manual distribution to each node. The key > >>>>>> difference > >>>>>>>>>>>>>> between > >>>>>>>>>>>>>>>>>> Drill and others is that, for Storm, say, code is > >> associated > >>>>>>>>>> with a > >>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions > are > >>>>>> global > >>>>>>>>>>> and > >>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>> no obvious life cycle that suggests when the code can be > >>>>>>>>>> unloaded. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 3. Have considered the class loader, dependency and name > >>>>>> space > >>>>>>>>>>>>>> isolation > >>>>>>>>>>>>>>>>>> issues addressed by such products as Tomcat (web apps) > or > >>>>>>>> Eclipse > >>>>>>>>>>>>>>>>>> (plugins)? Putting user code in the same namespace as > >> Drill > >>>>>> code > >>>>>>>>>>> is > >>>>>>>>>>>>>>>> quick > >>>>>>>>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to > >>>>>> problems > >>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> require long, frustrating debugging sessions to resolve. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing > >> items > >>>>>> 2 > >>>>>>>>>> and > >>>>>>>>>>> 3 > >>>>>>>>>>>>>>>> are a > >>>>>>>>>>>>>>>>>> big increase in scope, so I won’t be surprised if we > leave > >>>>>> those > >>>>>>>>>>>>>> issues > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> later. (Though, addressing item 2 might be the best way > to > >>>>>>>>>> address > >>>>>>>>>>>>>> item > >>>>>>>>>>>>>>>> 1.) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> If we want a very simple solution that requires minimal > >>>>>> change, > >>>>>>>>>>>>>> perhaps > >>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> can use an even simpler solution. In the proposed > design, > >> the > >>>>>>>>>> user > >>>>>>>>>>>>>> still > >>>>>>>>>>>>>>>>>> must distribute code to all the nodes. The primary > change > >> is > >>>>>> to > >>>>>>>>>>> tell > >>>>>>>>>>>>>>>> Drill > >>>>>>>>>>>>>>>>>> to load (or unload) that code. Can accomplish the same > >> result > >>>>>>>>>>> easier > >>>>>>>>>>>>>>>> simply > >>>>>>>>>>>>>>>>>> by having Drill periodically scan certain directories > >> looking > >>>>>>>> for > >>>>>>>>>>> new > >>>>>>>>>>>>>>>> (or > >>>>>>>>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve the > >> name > >>>>>>>>>> space > >>>>>>>>>>>>>>>> issues, > >>>>>>>>>>>>>>>>>> but will work for existing non-YARN Drill users without > >> new > >>>>>> SQL > >>>>>>>>>>>>>> syntax. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> - Paul > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau < > >>>>>>>> jacq...@dremio.com > >>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Two quick thoughts: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - (user) In the design document I didn't see any > >> discussion > >>>>>> of > >>>>>>>>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to > see > >>>>>> the > >>>>>>>>>>>>>> thinking > >>>>>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>>>> - (dev) There is a row oriented facade via the > >>>>>>>>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That > would > >>>>>> be a > >>>>>>>>>>> good > >>>>>>>>>>>>>>>> place > >>>>>>>>>>>>>>>>>>> to start when trying to implement an alternative > >> interface. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>> Jacques Nadeau > >>>>>>>>>>>>>>>>>>> CTO and Co-Founder, Dremio > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik < > >>>>>>>>>> j...@omernik.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I think > >> some > >>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>>>> ideas > >>>>>>>>>>>>>>>>>>>> around community java UDFs could be a better approach. > >> I'd > >>>>>>>> hate > >>>>>>>>>>> to > >>>>>>>>>>>>>>>> take > >>>>>>>>>>>>>>>>>>>> away from other work to hack in something like this. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers < > >>>>>>>>>>> prog...@maprtech.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Ted refers to source code transformation. Drill gains > >> its > >>>>>>>>>> speed > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>> value > >>>>>>>>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the > row-based > >>>>>>>>>> interface > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>> most > >>>>>>>>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are > >> very > >>>>>> type > >>>>>>>>>>>>>>>> specific, > >>>>>>>>>>>>>>>>>>>>> code is typically generated to handle the specifics > of > >>>>>> each > >>>>>>>>>>> type. > >>>>>>>>>>>>>>>>>>>> Accessing > >>>>>>>>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because of > >> the > >>>>>>>>>>>>>> "impedence > >>>>>>>>>>>>>>>>>>>>> mismatch" between how VVs work and the row-and-column > >> view > >>>>>>>>>>>>>> expected > >>>>>>>>>>>>>>>> by > >>>>>>>>>>>>>>>>>>>> most > >>>>>>>>>>>>>>>>>>>>> (non-Drill) developers. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I wonder if we've considered providing a row-oriented > >>>>>>>> "facade" > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>> used by roll-your own data sources and user-defined > row > >>>>>>>>>>>>>> transforms? > >>>>>>>>>>>>>>>>>> Might > >>>>>>>>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be > handy > >>>>>> for > >>>>>>>>>>> users > >>>>>>>>>>>>>>>>>> willing > >>>>>>>>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such a > >>>>>> facade, > >>>>>>>>>> the > >>>>>>>>>>>>>>>> Jython > >>>>>>>>>>>>>>>>>>>> row > >>>>>>>>>>>>>>>>>>>>> transforms that John mentions could be quite simple. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning < > >>>>>>>>>>>>>> ted.dunn...@gmail.com > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Since UDF's use source code transformation, using > >> Jython > >>>>>>>>>> would > >>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>> difficult. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva < > >>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi Charles, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't > >> invent > >>>>>>>>>>>>>> anything > >>>>>>>>>>>>>>>>>>>> new, > >>>>>>>>>>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit > >> restart. > >>>>>> But > >>>>>>>>>>>>>>>>>>>>> contributions > >>>>>>>>>>>>>>>>>>>>>>> are welcomed. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre < > >>>>>>>>>>> cgi...@gmail.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Arina, > >>>>>>>>>>>>>>>>>>>>>>>> Has there been any discussion about making it > >> possible > >>>>>> via > >>>>>>>>>>>>>> Jython > >>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>> something for users to write simple UDFs in > Python? > >>>>>>>>>>>>>>>>>>>>>>>> My ideal would be to have this capability > >> integrated in > >>>>>>>> the > >>>>>>>>>>> web > >>>>>>>>>>>>>>>> GUI > >>>>>>>>>>>>>>>>>>>>>> such > >>>>>>>>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python) > right > >>>>>> there, > >>>>>>>>>>>>>> submit > >>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes validation > >>>>>> tests. > >>>>>>>>>>>>>>>>>>>>>>>> —C > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva < > >>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all! > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs support > >> in > >>>>>>>>>> Drill ( > >>>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726 > ). > >>>>>> There > >>>>>>>>>>> is a > >>>>>>>>>>>>>>>>>>>> link > >>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>> design document in Jira description. > >>>>>>>>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards > >>>>>>>>>>>>>>>>>>>>>>>>> Arina > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>> > >> > >> > > >