Re: Dynamic UDFs support

Keys Botzum Thu, 21 Jul 2016 11:57:11 -0700

Recognize the difficulty. Not suggesting this be addressed in first
version. Just suggesting some thought about how a real user will
workaround. Maybe some doc and/or small changes can make this easier.


Keys
_______________________________
Keys Botzum
Senior Principal Technologist
kbot...@maprtech.com
443-718-0098
MapR Technologies
http://www.mapr.com
On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> wrote:

> Hi All,
>
> Adding a dynamic DROP would, of course, be a great addition! The reason
> for suggesting we skip that was to control project scope.
>
> Dynamic DROP requires a synchronization step. Here’s the scenario:
>
> * Foreman A starts a query using UDF U.
> * Foreman B receives a request to drop UDF U, followed by a request to add
> a new version of U, U’.
>
> How do we drop a function that may be in use? There are some tricky bits
> to work out, which seemed too overwhelming to consider all in one go.
>
> Clearly just dropping U and adding a new version of U with the same name
> leads to issues if not synchronized. If a Drillbit D is running a query
> with U when it receives notice to drop U, should D complete the query or
> fail it? If the query completes, then how does D deal with the request to
> register U’, which has the same name?
>
> Do we globally synchronize function deletion? (The foreman B that receives
> the drop request waits for all queries using U to finish.) But, how do we
> know which queries use U?
>
> An eventually consistent approach is to track the age of the oldest
> running query. Suppose B drops U at time T. Any query received after T that
> uses U will fail in planning. A new U’ can’t be registered until all
> queries that started before T complete.
>
> The primary challenge we face in both the CREATE and DROP cases is that
> Drill is distributed with little central coordination. That’s great for
> scale, but makes it hard to design features that require coordination. Some
> other tools solve this problem with a data dictionary (or “metastore").
> Alas, Drill does not have such a concept. So a seemingly simple feature
> like dynamic UDF becomes a major design challenge to get right.
>
> Thanks,
>
> - Paul
>
> > On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
> >
> > The whole point of this feature is to avoid Drill cluster restarts as the
> > name indicates 'Dynamic' UDFs.
> > So any design that requires restarts I would think would beat the
> purpose.
> >
> > I also think this is an example of a feature we start with a simple
> design
> > to serve the purpose, take feedback on how it is being deployed/used in
> > real user situations and improve it in subsequent releases.
> >
> > -thanks
> > Neeraja
> >
> > On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com>
> wrote:
> >
> >> I think there are a lot of great ideas here. My one concern is the lack
> of
> >> unload and thus presumably replace functionality. I'm just thinking
> about
> >> typical actual usage.
> >>
> >> In a typical development cycle someone writes something, tries it,
> learns,
> >> changes it, and tries again. Assuming I understand the design that
> change
> >> step requires a full Drill cluster restart. That is going to be very
> >> disruptive and will make UDF work nearly impossible without a dedicated
> >> "private" cluster for Drill. I realize that people should have access to
> >> the data they need and Drill in a development cluster but even then
> >> restarts can be hard since development clusters are often shared - and
> >> that's assuming such a cluster exists. I realize of course Drill can be
> run
> >> as a standalone Drillbit but I'm not convinced that desktops will have
> >> adequate access to the needed data.
> >>
> >> Having dealt with Java classloading over the years, I'm not claiming
> class
> >> replacement is an easy thing so I'll defer to others on the priority of
> >> that, but I'm wondering if there isn't some way to make UDF
> experimentation
> >> a bit easier/practical.
> >>
> >> Given the above, let me toss out some possibly naive ideas that maybe
> are
> >> workable:
> >> * can I easily run a standalone Drillbit on a Hadoop cluster node that
> is
> >> already running Drill servers? I'm sure this can be done, but is it
> easy?
> >> Could we perhaps make this clearer as an explicit kind of thing?
> >> * is there a way that when I deploy a UDF I can constrain the # of bits
> it
> >> is loaded into and perhaps even specify the bits?
> >>  * Obvious correlarary is I'd want my query to run on those bits and a
> >> not too disruptive way to restart just those bits
> >>
> >> The above may be obvious to Drill experts. If it is then perhaps the UDF
> >> docs could just point out how to easily develop UDFs in an iterative
> >> fashion.
> >>
> >> Keys
> >> _______________________________
> >> Keys Botzum
> >> Senior Principal Technologist
> >> kbot...@maprtech.com <mailto:kbot...@maprtech.com>
> >> 443-718-0098
> >> MapR Technologies
> >> http://www.mapr.com <http://www.mapr.com/>
> >>> On Jul 21, 2016, at 3:13 AM, Paul Rogers <prog...@maprtech.com> wrote:
> >>>
> >>> Always good to have options… Another is to try an eventual consistency
> >> model.
> >>>
> >>> The invariant here is the one that was mentioned earlier. Whenever a
> >> query is submitted with UDF U, that query either fails in planning
> (because
> >> U is unknown) or succeeds on all nodes (at least with respect to U.)
> >>>
> >>> For this to work, we need a constant view of the world. We can try to
> >> enforce consistency at function registration time (the original
> design), or
> >> via the Foreman (Parth’s design.) We can probably also use an eventual
> >> consistency model.
> >>>
> >>> Suppose we have a global name space of functions. With the global name
> >> space, we can establish this invariant: If a function is in that name
> >> space, then the Foreman accepts the query. If a Drillbit receives a
> >> fragment, but does not yet know of U, then the Drillbit A) knows that
> some
> >> foreman must have registered U (or the query would have failed in
> planning)
> >> and B) the Drillbit can download the function if not already in place.
> >>>
> >>> Folks pointed out that always checking a global name space is
> expensive,
> >> which it is. As it turns out, we can first check the local function
> >> registry. If the Drillbit already knows about the function, we’re done
> >> checking, no global check needed. It is only on the first use of a new
> >> function, when it is not yet loaded locally, that the global check must
> be
> >> done.
> >>>
> >>> For this to work the foreman that registers UDF U must:
> >>>
> >>> 1. From Arina’s proposed staging area, check the jar contents to see if
> >> a name conflict exists with the global registry. (Requires some class
> >> loader code.)
> >>> 2. If a conflict exists, refuse to register the function and return an
> >> error.
> >>> 3. If no conflict exists, register the function in the global name
> space
> >> and move the jar to the registered area in DFS.
> >>>
> >>> In this model, it is entirely optional whether the foreman that
> >> registers U alerts other Drillbits. Instead, Drillbits could poll from
> time
> >> to time, or just wait until they see a query with U and do the download
> at
> >> that time.
> >>>
> >>> When a new Drillbit starts, it can load all functions in the registry
> >> area because these have all passed the name collision test and can all
> be
> >> used in queries. Any new registrations will be found and loaded as
> above.
> >> (It is not required to preload functions, but it might help
> performance.)
> >>>
> >>> ZK is the only place we have at present for the global name space, so
> >> that seems the logical tool. ZK allows atomic operations, which we need
> >> here. Operations 1, 2, and 3 above should be atomic.
> >>>
> >>> Unfortunately, we can’t do the DFS move atomically with a ZK name space
> >> insertion. So, the global name check & insert should be atomic. If that
> >> succeeds, copy the jar into the registered folder. There are a few
> details
> >> to work out to handle special cases, but we can cover those another
> time.
> >> (Hint: what happens if the Foreman crashes after insetting the ZK entry
> but
> >> before moving the jar?)
> >>>
> >>> None of the proposed designs permit graceful unloading of functions.
> So,
> >> deleting functions will require a cluster restart to establish a new
> stable
> >> checkpoint.
> >>>
> >>> We can recommend that on each cluster restart, any functions in the DFS
> >> registry be copied to each Drillbit (much easier with the coming YARN
> >> integration) as a way of keeping the DFS registry a reasonable size.
> >>>
> >>> More details to work out, but that’s the gist of the concept.
> >>>
> >>> Thanks,
> >>>
> >>> - Paul
> >>>
> >>>> On Jul 20, 2016, at 2:37 PM, Parth Chandra <pchan...@maprtech.com>
> >> wrote:
> >>>>
> >>>> My notes from the hangout with Arina and Paul -
> >>>>
> >>>> Notes -
> >>>>
> >>>> There are two invariants for the registration process -
> >>>> 1) There is a registration/validated directory in the DFS that
> contains
> >>>> UDFS that have been validated by the registering foreman. All
> drillbits
> >>>> will have access to this directory and on startup and/or UDF
> >> registration,
> >>>> the jars in this directory are sync'd up with a local UDF directory
> >>>> 2) During the process of registration, the registering foreman
> creates a
> >>>> Zookeeper node that indicates that one or more drillbits has not yet
> >>>> registered the UDF.
> >>>>
> >>>> The basic workflow is that UDF jars are copied from the staging
> >> directory
> >>>> to the registration directory and validated. Once they are validated,
> >> the
> >>>> available drillbits are told to register the UDF. Registering the UDF
> >>>> consists of copying the node to a local UDF directory and updating the
> >>>> local (in-memory) udf registry. A sentinel node in zookeeper is used
> to
> >>>> track when all the drillbits have registered the UDF.
> >>>>
> >>>> There were two main suggestions : Immediate registration and lazy
> >>>> registration,
> >>>>
> >>>> Immediate registration -
> >>>> Foreman tells all drillbits to register. Creates a Zookeeper node to
> >>>> track.
> >>>> Every drillbit makes a local copy and updates zookeeper node to show
> it
> >>>> is done.
> >>>> Foreman checks the zookeeper node and when all available drillbits
> have
> >>>> acknowledged, sends a message to all drillbits to complete
> registration.
> >>>> Foreman removes ZK node.
> >>>> All Drillbits update their local UDF registry
> >>>> Drillbit startup will block if there is a ZK node indicating
> >>>> registration is in progress.
> >>>> This approach needs to be validated to see if any race conditions
> >> exist.
> >>>>
> >>>> Lazy registration
> >>>> Once a UDF is copied to the registration folder, the UDF is
> essentially
> >>>> registered. On first use, a drillbit may hit a classnotfound exception
> >> in
> >>>> which case it will look for the UDF in the registration directory. If
> >>>> found, it will copy to the local directory and add the UDF to it's
> local
> >>>> registry.
> >>>> This approach should be investigated to see if it fits in with the
> >>>> current UDF execution code.
> >>>>
> >>>>
> >>>> On Mon, Jul 18, 2016 at 3:36 PM, Parth Chandra <pchan...@maprtech.com
> >
> >>>> wrote:
> >>>>
> >>>>> +1 on simplifying the design and postpone the items Paul has
> suggested.
> >>>>>
> >>>>> Arina, Paul, I think we need to work out some of the design related
> to
> >>>>> registering the UDF. Are you guys open for a quick hangout @10 a.m
> PDT
> >>>>> tomorrow?
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com>
> >> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> We’ve had quite a lively debate in the “comments” section of Arina’s
> >>>>>> wonderful design doc. Zelaine made a great suggestion: summarize the
> >> user
> >>>>>> experience as a way of making sense of the wealth of detailed
> >> comments.
> >>>>>>
> >>>>>> IMHO, the most important user experience goals are:
> >>>>>>
> >>>>>> 1. When a user submits a CREATE FUNCTION command, the command
> returns
> >>>>>> quickly (within a few seconds at most.)
> >>>>>> 2. If the above user then issues a query using that function (to the
> >> same
> >>>>>> Foreman), that query is guaranteed to successfully use the new
> >> function on
> >>>>>> all nodes.
> >>>>>> 3. Other users, connecting to any Foreman will see a very clean
> >> behavior
> >>>>>> when submitting a query with the new function. Before some point in
> >> time
> >>>>>> (can be different for each Foreman), a query with the function fails
> >> in
> >>>>>> planning. After that point, queries are guaranteed to successfully
> >> use the
> >>>>>> new function on all nodes.
> >>>>>>
> >>>>>> Basically, this says that CREATE FUNCTION can’t (potentially) take a
> >> long
> >>>>>> time. Use of functions can’t result in random failures during the
> >> time that
> >>>>>> the function is propagated across Drillbits.
> >>>>>>
> >>>>>> The goals we can perhaps postpone are:
> >>>>>>
> >>>>>> 1. Class name space isolation. (Allows two data scientists to define
> >> the
> >>>>>> same class without collisions.)
> >>>>>> 2. Function name spaces. (Allows me to define “paul.foo” and you to
> >>>>>> define “bob.foo” with out collisions. (Needed if many people develop
> >>>>>> functions independently. Else, we need a global name space.)
> >>>>>> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and
> it
> >>>>>> requires unloading classes and name space cleanup.) (Just let the
> >> cleanup
> >>>>>> happen offline.)
> >>>>>> 4. Dependency jars (e.g. third party libraries, etc.) (We require
> >> those
> >>>>>> to be statically added to the class path before Drill starts.)
> >>>>>>
> >>>>>> We are not creating per-user name spaces, or allowing people to use
> >>>>>> production clusters to try/revise functions. We’re just sampling
> >> deployment
> >>>>>> of simple functions.
> >>>>>>
> >>>>>> That’s my suggestion, what do others suggest?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> - Paul
> >>>>>>
> >>>>>>> On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva <
> >>>>>> arina.yelchiy...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> I also agree on using Zookeeper. I have re-worked dynamic UDF
> support
> >>>>>>> document taking into account Zookeeper usage.
> >>>>>>>
> >>>>>>> Link to the document -
> >>>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> >>>>>>>
> >>>>>>> Kind regards
> >>>>>>> Arina
> >>>>>>>
> >>>>>>> On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <prog...@maprtech.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Great idea! We already use ZK to track storage plugins. ZK is
> >> perhaps
> >>>>>>>> better suited to register each jar and/or function that using
> files
> >> in
> >>>>>> DFS.
> >>>>>>>> Still need to work out the proper sequencing. But you are right,
> >> this
> >>>>>> is
> >>>>>>>> the kind of thing that ZK is supposed to solve.
> >>>>>>>>
> >>>>>>>> - Paul
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Reading thru some of Paul's comments on maintaining a consistent
> >> state
> >>>>>>>> for
> >>>>>>>>> the registration of the UDF, it looks like we need a consensus
> >>>>>> protocol
> >>>>>>>> for
> >>>>>>>>> determining that all the Drillbits have the UDF deployed.
> >>>>>>>>> I believe Zookeeper can provide a stronger guarantee than a 2
> phase
> >>>>>>>>> approach. Should we look into that?
> >>>>>>>>>
> >>>>>>>>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva <
> >>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all!
> >>>>>>>>>>
> >>>>>>>>>> I have updated design document.
> >>>>>>>>>> Main changes:
> >>>>>>>>>> 1. Add to Drill’s config цшер  the staging and registration DFS
> >>>>>>>> locations.
> >>>>>>>>>> 2. User is no longer is responsible for copying jars into
> drillbit
> >>>>>>>> nodes.
> >>>>>>>>>> Now user needs to copy jars into staging DFS location from where
> >>>>>>>> drillbits
> >>>>>>>>>> will copy them to local fs.
> >>>>>>>>>> 2. During UDFs registration jars will be moved to DFS
> registration
> >>>>>> area.
> >>>>>>>>>> 3. During start up drillbit will copy all jars from registration
> >>>>>> area,
> >>>>>>>> so
> >>>>>>>>>> newly added drillbit will have all UDFs as others.
> >>>>>>>>>> 4. Security issues - probably they will be added later as
> >>>>>> enhancement.
> >>>>>>>>>>
> >>>>>>>>>> More detains in the document:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> >>>>>>>>>>
> >>>>>>>>>> Kind regards
> >>>>>>>>>> Arina
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers <
> prog...@maprtech.com
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi All,
> >>>>>>>>>>>
> >>>>>>>>>>> To answer Arina on item 3: there is actually no good location
> on
> >> any
> >>>>>>>>>> local
> >>>>>>>>>>> node to put the UDFs. Reason: DoY allows the admin to start a
> >>>>>> Drillbit
> >>>>>>>> on
> >>>>>>>>>>> any available node. When it starts, a new, fresh copy of Drill
> >> will
> >>>>>> be
> >>>>>>>>>>> downloaded, and this can happen after the user issued the
> CREATE
> >>>>>>>> command.
> >>>>>>>>>>>
> >>>>>>>>>>> What we need is a shared, secure distributed storage location
> >> from
> >>>>>>>> which
> >>>>>>>>>>> Drillbits can download the needed jar files. Something like…
> DFS!
> >>>>>>>> Indeed,
> >>>>>>>>>>> this is how YARN stores the Drill archive from which it creates
> >> the
> >>>>>>>> Drill
> >>>>>>>>>>> install directory on each node. We can’t quite use YARN’s
> >> mechanism
> >>>>>>>> (YARN
> >>>>>>>>>>> is aware only of the files uploaded when launching an app), but
> >> we
> >>>>>> can
> >>>>>>>> do
> >>>>>>>>>>> something similar.
> >>>>>>>>>>>
> >>>>>>>>>>> So, brainstorming a bit…
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Store the UDF jar in a pre-defined DFS location.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. The CREATE function 1) uploads the jar to the DFS location,
> >> and
> >>>>>> 2)
> >>>>>>>>>>> creates some kind of registry entry.
> >>>>>>>>>>>
> >>>>>>>>>>> 3. The DELETE function 1) deregisters the jar (and function),
> >> but 2)
> >>>>>>>> does
> >>>>>>>>>>> not delete the jar (this allows in-flight queries to complete.)
> >>>>>>>>>>>
> >>>>>>>>>>> 3. Drillbits periodically check DFS for changed registrations,
> >>>>>>>>>> downloading
> >>>>>>>>>>> any needed jars. (YARN, Spark, Storm and others already do
> >> something
> >>>>>>>>>>> similar.)
> >>>>>>>>>>>
> >>>>>>>>>>> 4. Registry check is “forced” when processing a query with a
> >>>>>> function
> >>>>>>>>>> that
> >>>>>>>>>>> is not currently registered. (Doing so resolves any possible
> race
> >>>>>>>>>>> conditions.)
> >>>>>>>>>>>
> >>>>>>>>>>> 5. Some process (perhaps time based) removes old, unregistered
> >> jar
> >>>>>>>> files.
> >>>>>>>>>>> (Or, we could get fancy and use reference counts. The reference
> >>>>>> count
> >>>>>>>>>> would
> >>>>>>>>>>> be required if the user wants to delete, then recreate, the
> same
> >>>>>>>> function
> >>>>>>>>>>> and jar to avoid conflict with in-flight queries.)
> >>>>>>>>>>>
> >>>>>>>>>>> We can build security on this as follows:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Define permissions for who can write to the DFS location.
> Or,
> >>>>>>>> indeed,
> >>>>>>>>>>> have subdirectories by user and grant each user permission only
> >> on
> >>>>>>>> their
> >>>>>>>>>>> own UDF directory.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. Provide separate registries for per-user functions (private)
> >> and
> >>>>>>>>>> global
> >>>>>>>>>>> functions (public). Only the admin can add global functions.
> But,
> >>>>>> only
> >>>>>>>>>> the
> >>>>>>>>>>> user that uploads a private function can use it.
> >>>>>>>>>>>
> >>>>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their own
> >> name
> >>>>>>>> space
> >>>>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call
> >> into a
> >>>>>>>> UDF,
> >>>>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill
> >>>>>> classes
> >>>>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t
> clash
> >>>>>> with
> >>>>>>>>>> your
> >>>>>>>>>>> function Foo if both are private.
> >>>>>>>>>>>
> >>>>>>>>>>> Sorry that this has wandered a bit far from the original simple
> >>>>>> design,
> >>>>>>>>>>> but the above may capture much of what folks expect in modern
> >>>>>>>> distributed
> >>>>>>>>>>> big data systems.
> >>>>>>>>>>>
> >>>>>>>>>>> I wonder if a good next step might be to review the notes in
> the
> >>>>>> design
> >>>>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a
> >> summary
> >>>>>> of
> >>>>>>>>>>> technical requirements, and a proposed design. Postpone, at
> least
> >>>>>> for
> >>>>>>>>>> now,
> >>>>>>>>>>> concerns about the amount of work; we can worry about that once
> >>>>>> folks
> >>>>>>>>>> agree
> >>>>>>>>>>> on your revised design.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> - Paul
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva <
> >>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4. Authorization model mentioned by Julia and John
> >>>>>>>>>>>> If user won't have rights to copy jars to UDF classpath, which
> >> can
> >>>>>> be
> >>>>>>>>>>>> restricted by file system, he won't be able to do much harm by
> >>>>>> running
> >>>>>>>>>>>> CREATE command. If UDFs from jar were already registered,
> CREATE
> >>>>>>>>>>> statement
> >>>>>>>>>>>> will fail. CREATE OR REPLACE will just re-register UDFs.
> >>>>>>>>>>>> But DELETE command is not safe. If user knows jar name, he can
> >>>>>> delete
> >>>>>>>>>> all
> >>>>>>>>>>>> associated with it UDFs, as well as the binary and source
> jars.
> >>>>>> That's
> >>>>>>>>>>>> where we'll probably need to impose restrictions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva <
> >>>>>>>>>>> arina.yelchiy...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> 1. DELETE command - I missed to indicate it document but had
> it
> >>>>>> in my
> >>>>>>>>>>>>> mind. When user issues DELETE command, all UDF associated
> with
> >>>>>>>>>> indicated
> >>>>>>>>>>>>> jar is removed from DrillFunctionRegistry. And then binary
> and
> >>>>>> source
> >>>>>>>>>>>>> files are also deleted from UDF classpath.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2. Distribution race condition described by Paul
> >>>>>>>>>>>>> User issues CREATE command and gets confirmation that UDFs is
> >>>>>>>>>> registered
> >>>>>>>>>>>>> only if all drilllbits have confirmed that registration was
> >>>>>>>>>> successful.
> >>>>>>>>>>>>> I don't expect user to start using UDFs in queries prior to
> >> CREATE
> >>>>>>>>>>> command
> >>>>>>>>>>>>> success / failure result, which is possible but strange.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 3. DoY
> >>>>>>>>>>>>> @Paul
> >>>>>>>>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly we
> >> use
> >>>>>>>>>>>>> $DRILL_UDF environment variable which will be set during
> >> drillbit
> >>>>>>>>>> start
> >>>>>>>>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable will
> be
> >>>>>> added
> >>>>>>>>>> to
> >>>>>>>>>>>>> Drill classpath during start.
> >>>>>>>>>>>>> Will it ease DoY integration somehow?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind regards
> >>>>>>>>>>>>> Arina
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman
> >>>>>>>>>>> <yufeld...@yahoo.com.invalid>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Just thoughts:
> >>>>>>>>>>>>>> You can try to reuse distributed cache Let Drill AM do the
> >>>>>> needful
> >>>>>>>> in
> >>>>>>>>>>>>>> terms of orchestrating UDF jars distribution.
> >>>>>>>>>>>>>> But
> >>>>>>>>>>>>>> I would be inclined to have a common path that is
> independent
> >> of
> >>>>>> the
> >>>>>>>>>>> fact
> >>>>>>>>>>>>>> that it is Drill on YARN or not, as maintaining two separate
> >>>>>> ways of
> >>>>>>>>>>>>>> dealing with loading/unloading UDFs will be painful and
> error
> >>>>>> prone.
> >>>>>>>>>>>>>> One more note (I left a comment in the doc) - not sure about
> >>>>>>>>>>>>>> authorization model here - we need to have some.
> >>>>>>>>>>>>>> Just my 2cThanks
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> From: Paul Rogers <prog...@maprtech.com>
> >>>>>>>>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org>
> >>>>>>>>>>>>>> Sent: Monday, June 20, 2016 7:32 PM
> >>>>>>>>>>>>>> Subject: Re: Dynamic UDFs support
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Neeraja,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The proposal calls for the user to copy the jar file to each
> >>>>>>>> Drillbit
> >>>>>>>>>>>>>> node. The jar would go into a new
> >> $DRILL_HOME/jars/3rdparty/udf
> >>>>>>>>>>> directory.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying
> Drill
> >>>>>> code
> >>>>>>>> to
> >>>>>>>>>>>>>> each node (which is good.) YARN puts that code in a location
> >>>>>> known
> >>>>>>>>>>> only to
> >>>>>>>>>>>>>> YARN. Since the location is private to YARN, the user can’t
> >>>>>> easily
> >>>>>>>>>> hunt
> >>>>>>>>>>>>>> down the location in order to add the udf jar. Even if the
> >> user
> >>>>>> did
> >>>>>>>>>>> find
> >>>>>>>>>>>>>> the location, the next Drillbit to start would create a new
> >> copy
> >>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>> Drill software, without the udf jar.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Second, in DoY we have separated user files from Drill
> >> software.
> >>>>>>>> This
> >>>>>>>>>>>>>> makes it much easier to distribute the software to each
> node:
> >> we
> >>>>>>>> give
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> Drill distribution tar archive to YARN, and YARN copies it
> to
> >>>>>> each
> >>>>>>>>>>> node and
> >>>>>>>>>>>>>> untars the Drill files. We make a separate copy of the (far
> >>>>>> smaller)
> >>>>>>>>>>> set of
> >>>>>>>>>>>>>> user config files.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If the udf jar goes into a Drill folder
> >>>>>>>>>>> ($DRILL_HOME/jars/3rdparty/udf),
> >>>>>>>>>>>>>> then the user would have to rebuild the Drill tar file each
> >> time
> >>>>>>>> they
> >>>>>>>>>>> add a
> >>>>>>>>>>>>>> udf jar. When I tried this myself when building DoY, I found
> >> it
> >>>>>> to
> >>>>>>>> be
> >>>>>>>>>>> slow
> >>>>>>>>>>>>>> and error-prone.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So, the solution is to place the udf code in the new “site”
> >>>>>>>>>> directory:
> >>>>>>>>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY
> >>>>>>>>>> automatically
> >>>>>>>>>>>>>> distribute the code to every node. Perfect! Except that it
> >> does
> >>>>>> not
> >>>>>>>>>>> work to
> >>>>>>>>>>>>>> dynamically distribute code after Drill starts.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For DoY, the solution requirements are:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. Distribute code using Drill itself, rather than manually
> >>>>>> copying
> >>>>>>>>>>> jars
> >>>>>>>>>>>>>> to (unknown) Drill directories.
> >>>>>>>>>>>>>> 2. Ensure the solution works even if another Drillbit is
> spun
> >> up
> >>>>>>>>>> later,
> >>>>>>>>>>>>>> and uses the original Drill tar file.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I’m thinking we want to leverage DFS: place udf files into a
> >>>>>>>>>> well-known
> >>>>>>>>>>>>>> DFS directory. Register the udf into, say, ZK. When a new
> >>>>>> Drillbit
> >>>>>>>>>>> starts,
> >>>>>>>>>>>>>> it looks for new udf jars in ZK, copies the file to a
> >> temporary
> >>>>>>>>>>> location,
> >>>>>>>>>>>>>> and launches. An existing Drill is notified of the change
> and
> >>>>>> does
> >>>>>>>>>> the
> >>>>>>>>>>> same
> >>>>>>>>>>>>>> download process. Clean-up is needed at some point to remove
> >> ZK
> >>>>>>>>>>> entries if
> >>>>>>>>>>>>>> the udf jar becomes statically available on the next launch.
> >> That
> >>>>>>>>>> needs
> >>>>>>>>>>>>>> more thought.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We’d still need the phases mentioned earlier to ensure
> >>>>>> consistency.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Suggestions anyone as to how to do this super simply & still
> >> get
> >>>>>> it
> >>>>>>>>>> to
> >>>>>>>>>>>>>> work with DoY?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Paul
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala <
> >>>>>>>>>>>>>> nrentachint...@maprtech.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This will need to work with YARN (Once Drill is YARN
> >> enabled, I
> >>>>>>>>>> would
> >>>>>>>>>>>>>>> expect a lot of users using it in conjunction with YARN).
> >>>>>>>>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. Can
> >> you
> >>>>>>>>>>>>>> elaborate.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Neeraja
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <
> >>>>>> prog...@maprtech.com
> >>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Good enough, as long as we document the limitation that
> this
> >>>>>>>>>> feature
> >>>>>>>>>>>>>> can’t
> >>>>>>>>>>>>>>>> work with YARN deployment as users generally do not have
> >>>>>> access to
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> temporary “localization” directories where the Drill code
> is
> >>>>>>>> placed
> >>>>>>>>>>> by
> >>>>>>>>>>>>>> YARN.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Note that the jar distribution race condition issue occurs
> >> with
> >>>>>>>> the
> >>>>>>>>>>>>>>>> proposed design: I believe I sketched out a scenario in
> one
> >> of
> >>>>>> the
> >>>>>>>>>>>>>> earlier
> >>>>>>>>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION command.
> >> It
> >>>>>>>> tells
> >>>>>>>>>>>>>>>> Drillbit B. While informing the other Drillbits, Drillbit
> B
> >>>>>> plans
> >>>>>>>>>> and
> >>>>>>>>>>>>>>>> launches a query that uses the function. Drillbit Z starts
> >>>>>>>>>> execution
> >>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>> query before it learns from A about the new function. This
> >>>>>> will be
> >>>>>>>>>>>>>> rare —
> >>>>>>>>>>>>>>>> just rare enough to create very hard to reproduce bugs.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The only reliable solution is to do the work in multiple
> >>>>>> passes:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Pass 1: Ask each node to load the function, but not make
> it
> >>>>>>>>>> available
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the planner. (it would be available to the execution
> >> engine.)
> >>>>>>>>>>>>>>>> Pass 2: Await confirmation from each node that this is
> done.
> >>>>>>>>>>>>>>>> Pass 3: Alert every node that it is now free to plan
> queries
> >>>>>> with
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> function.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Finally, I wonder if we should design the SQL syntax based
> >> on a
> >>>>>>>>>>>>>> long-term
> >>>>>>>>>>>>>>>> design, even if the feature itself is a short-term
> >> work-around.
> >>>>>>>>>>>>>> Changing
> >>>>>>>>>>>>>>>> the syntax later might break scripts that users might
> write.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So, the question for the group is this: is the value of
> >>>>>>>>>> semi-complete
> >>>>>>>>>>>>>>>> feature sufficient to justify the potential problems?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Paul
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra <
> >>>>>>>> pchan...@maprtech.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Moving discussion to dev.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I believe the aim is to do a simple implementation
> without
> >> the
> >>>>>>>>>>>>>> complexity
> >>>>>>>>>>>>>>>>> of distributing the UDF. I think the document should make
> >> this
> >>>>>>>>>>>>>> limitation
> >>>>>>>>>>>>>>>>> clear.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Per Paul's point on there being a simpler solution of
> just
> >>>>>> having
> >>>>>>>>>>> each
> >>>>>>>>>>>>>>>>> drillbit detect the if a UDF is present, I think the
> >> problem
> >>>>>> is
> >>>>>>>>>> if a
> >>>>>>>>>>>>>> UDF
> >>>>>>>>>>>>>>>>> get's deployed to some but not all drillbits. A query can
> >> then
> >>>>>>>>>> start
> >>>>>>>>>>>>>>>>> executing but not run successfully. The intent of the
> >> create
> >>>>>>>>>>> commands
> >>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>> be to ensure that all drillbits have the UDF or none
> would.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think Jacques' point about ownership conflicts is not
> >>>>>> addressed
> >>>>>>>>>>>>>>>> clearly.
> >>>>>>>>>>>>>>>>> Also, the unloading is not clear. The delete command
> should
> >>>>>>>>>> probably
> >>>>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>>> the UDF and unload it.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <
> >>>>>>>>>> prog...@maprtech.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary
> >>>>>> comments
> >>>>>>>>>> for
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> community to consider.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN project.
> Is
> >>>>>> this
> >>>>>>>> a
> >>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>> fix for one unique problem, or is it worth expanding the
> >>>>>>>> solution
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make
> the
> >> two
> >>>>>>>>>> work
> >>>>>>>>>>>>>>>> together
> >>>>>>>>>>>>>>>>>> later. See comments in docs for details.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 2. Have we, by chance, looked at how other projects
> handle
> >>>>>> code
> >>>>>>>>>>>>>>>>>> distribution? Spark, Storm and others automatically
> deploy
> >>>>>> code
> >>>>>>>>>>>>>> across
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> cluster; no manual distribution to each node. The key
> >>>>>> difference
> >>>>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>>> Drill and others is that, for Storm, say, code is
> >> associated
> >>>>>>>>>> with a
> >>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions
> are
> >>>>>> global
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>> no obvious life cycle that suggests when the code can be
> >>>>>>>>>> unloaded.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 3. Have considered the class loader, dependency and name
> >>>>>> space
> >>>>>>>>>>>>>> isolation
> >>>>>>>>>>>>>>>>>> issues addressed by such products as Tomcat (web apps)
> or
> >>>>>>>> Eclipse
> >>>>>>>>>>>>>>>>>> (plugins)? Putting user code in the same namespace as
> >> Drill
> >>>>>> code
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>>> quick
> >>>>>>>>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to
> >>>>>> problems
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> require long, frustrating debugging sessions to resolve.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing
> >> items
> >>>>>> 2
> >>>>>>>>>> and
> >>>>>>>>>>> 3
> >>>>>>>>>>>>>>>> are a
> >>>>>>>>>>>>>>>>>> big increase in scope, so I won’t be surprised if we
> leave
> >>>>>> those
> >>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> later. (Though, addressing item 2 might be the best way
> to
> >>>>>>>>>> address
> >>>>>>>>>>>>>> item
> >>>>>>>>>>>>>>>> 1.)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If we want a very simple solution that requires minimal
> >>>>>> change,
> >>>>>>>>>>>>>> perhaps
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> can use an even simpler solution. In the proposed
> design,
> >> the
> >>>>>>>>>> user
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>> must distribute code to all the nodes. The primary
> change
> >> is
> >>>>>> to
> >>>>>>>>>>> tell
> >>>>>>>>>>>>>>>> Drill
> >>>>>>>>>>>>>>>>>> to load (or unload) that code. Can accomplish the same
> >> result
> >>>>>>>>>>> easier
> >>>>>>>>>>>>>>>> simply
> >>>>>>>>>>>>>>>>>> by having Drill periodically scan certain directories
> >> looking
> >>>>>>>> for
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>> (or
> >>>>>>>>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve the
> >> name
> >>>>>>>>>> space
> >>>>>>>>>>>>>>>> issues,
> >>>>>>>>>>>>>>>>>> but will work for existing non-YARN Drill users without
> >> new
> >>>>>> SQL
> >>>>>>>>>>>>>> syntax.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> - Paul
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <
> >>>>>>>> jacq...@dremio.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Two quick thoughts:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> - (user) In the design document I didn't see any
> >> discussion
> >>>>>> of
> >>>>>>>>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to
> see
> >>>>>> the
> >>>>>>>>>>>>>> thinking
> >>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>> - (dev) There is a row oriented facade via the
> >>>>>>>>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That
> would
> >>>>>> be a
> >>>>>>>>>>> good
> >>>>>>>>>>>>>>>> place
> >>>>>>>>>>>>>>>>>>> to start when trying to implement an alternative
> >> interface.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <
> >>>>>>>>>> j...@omernik.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I think
> >> some
> >>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> ideas
> >>>>>>>>>>>>>>>>>>>> around community java UDFs could be a better approach.
> >> I'd
> >>>>>>>> hate
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> take
> >>>>>>>>>>>>>>>>>>>> away from other work to hack in something like this.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <
> >>>>>>>>>>> prog...@maprtech.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Ted refers to source code transformation. Drill gains
> >> its
> >>>>>>>>>> speed
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>> value
> >>>>>>>>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the
> row-based
> >>>>>>>>>> interface
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are
> >> very
> >>>>>> type
> >>>>>>>>>>>>>>>> specific,
> >>>>>>>>>>>>>>>>>>>>> code is typically generated to handle the specifics
> of
> >>>>>> each
> >>>>>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>> Accessing
> >>>>>>>>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because of
> >> the
> >>>>>>>>>>>>>> "impedence
> >>>>>>>>>>>>>>>>>>>>> mismatch" between how VVs work and the row-and-column
> >> view
> >>>>>>>>>>>>>> expected
> >>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>> (non-Drill) developers.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I wonder if we've considered providing a row-oriented
> >>>>>>>> "facade"
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> used by roll-your own data sources and user-defined
> row
> >>>>>>>>>>>>>> transforms?
> >>>>>>>>>>>>>>>>>> Might
> >>>>>>>>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be
> handy
> >>>>>> for
> >>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>> willing
> >>>>>>>>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such a
> >>>>>> facade,
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> Jython
> >>>>>>>>>>>>>>>>>>>> row
> >>>>>>>>>>>>>>>>>>>>> transforms that John mentions could be quite simple.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <
> >>>>>>>>>>>>>> ted.dunn...@gmail.com
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Since UDF's use source code transformation, using
> >> Jython
> >>>>>>>>>> would
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> difficult.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
> >>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Charles,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't
> >> invent
> >>>>>>>>>>>>>> anything
> >>>>>>>>>>>>>>>>>>>> new,
> >>>>>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit
> >> restart.
> >>>>>> But
> >>>>>>>>>>>>>>>>>>>>> contributions
> >>>>>>>>>>>>>>>>>>>>>>> are welcomed.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <
> >>>>>>>>>>> cgi...@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Arina,
> >>>>>>>>>>>>>>>>>>>>>>>> Has there been any discussion about making it
> >> possible
> >>>>>> via
> >>>>>>>>>>>>>> Jython
> >>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>> something for users to write simple UDFs in
> Python?
> >>>>>>>>>>>>>>>>>>>>>>>> My ideal would be to have this capability
> >> integrated in
> >>>>>>>> the
> >>>>>>>>>>> web
> >>>>>>>>>>>>>>>> GUI
> >>>>>>>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python)
> right
> >>>>>> there,
> >>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes validation
> >>>>>> tests.
> >>>>>>>>>>>>>>>>>>>>>>>> —C
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
> >>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs support
> >> in
> >>>>>>>>>> Drill (
> >>>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726
> ).
> >>>>>> There
> >>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>>>>> link
> >>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> design document in Jira description.
> >>>>>>>>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards
> >>>>>>>>>>>>>>>>>>>>>>>>> Arina
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>
>

Re: Dynamic UDFs support

Reply via email to