Re: Dynamic UDFs support

Neeraja Rentachintala Thu, 21 Jul 2016 22:38:42 -0700

It seems like we are reaching a conclusion here in terms of starting with a
simpler implementation i.e being able to deploy UDFs dynamically without
Drillbit restarts based off a jars in DFS location.  Dropping functions
dynamically is out of scope for version 1 of this feature (we assume
development of UDFs is happening on user laptop or a dev cluster where its
ok to have restart).


-Neeraja

On Thu, Jul 21, 2016 at 11:56 AM, Keys Botzum <kbot...@maprtech.com> wrote:

> Recognize the difficulty. Not suggesting this be addressed in first
> version. Just suggesting some thought about how a real user will
> workaround. Maybe some doc and/or small changes can make this easier.
>
> Keys
> _______________________________
> Keys Botzum
> Senior Principal Technologist
> kbot...@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
> On Jul 21, 2016 1:45 PM, "Paul Rogers" <prog...@maprtech.com> wrote:
>
> > Hi All,
> >
> > Adding a dynamic DROP would, of course, be a great addition! The reason
> > for suggesting we skip that was to control project scope.
> >
> > Dynamic DROP requires a synchronization step. Here’s the scenario:
> >
> > * Foreman A starts a query using UDF U.
> > * Foreman B receives a request to drop UDF U, followed by a request to
> add
> > a new version of U, U’.
> >
> > How do we drop a function that may be in use? There are some tricky bits
> > to work out, which seemed too overwhelming to consider all in one go.
> >
> > Clearly just dropping U and adding a new version of U with the same name
> > leads to issues if not synchronized. If a Drillbit D is running a query
> > with U when it receives notice to drop U, should D complete the query or
> > fail it? If the query completes, then how does D deal with the request to
> > register U’, which has the same name?
> >
> > Do we globally synchronize function deletion? (The foreman B that
> receives
> > the drop request waits for all queries using U to finish.) But, how do we
> > know which queries use U?
> >
> > An eventually consistent approach is to track the age of the oldest
> > running query. Suppose B drops U at time T. Any query received after T
> that
> > uses U will fail in planning. A new U’ can’t be registered until all
> > queries that started before T complete.
> >
> > The primary challenge we face in both the CREATE and DROP cases is that
> > Drill is distributed with little central coordination. That’s great for
> > scale, but makes it hard to design features that require coordination.
> Some
> > other tools solve this problem with a data dictionary (or “metastore").
> > Alas, Drill does not have such a concept. So a seemingly simple feature
> > like dynamic UDF becomes a major design challenge to get right.
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Jul 21, 2016, at 7:21 AM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> > >
> > > The whole point of this feature is to avoid Drill cluster restarts as
> the
> > > name indicates 'Dynamic' UDFs.
> > > So any design that requires restarts I would think would beat the
> > purpose.
> > >
> > > I also think this is an example of a feature we start with a simple
> > design
> > > to serve the purpose, take feedback on how it is being deployed/used in
> > > real user situations and improve it in subsequent releases.
> > >
> > > -thanks
> > > Neeraja
> > >
> > > On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com>
> > wrote:
> > >
> > >> I think there are a lot of great ideas here. My one concern is the
> lack
> > of
> > >> unload and thus presumably replace functionality. I'm just thinking
> > about
> > >> typical actual usage.
> > >>
> > >> In a typical development cycle someone writes something, tries it,
> > learns,
> > >> changes it, and tries again. Assuming I understand the design that
> > change
> > >> step requires a full Drill cluster restart. That is going to be very
> > >> disruptive and will make UDF work nearly impossible without a
> dedicated
> > >> "private" cluster for Drill. I realize that people should have access
> to
> > >> the data they need and Drill in a development cluster but even then
> > >> restarts can be hard since development clusters are often shared - and
> > >> that's assuming such a cluster exists. I realize of course Drill can
> be
> > run
> > >> as a standalone Drillbit but I'm not convinced that desktops will have
> > >> adequate access to the needed data.
> > >>
> > >> Having dealt with Java classloading over the years, I'm not claiming
> > class
> > >> replacement is an easy thing so I'll defer to others on the priority
> of
> > >> that, but I'm wondering if there isn't some way to make UDF
> > experimentation
> > >> a bit easier/practical.
> > >>
> > >> Given the above, let me toss out some possibly naive ideas that maybe
> > are
> > >> workable:
> > >> * can I easily run a standalone Drillbit on a Hadoop cluster node that
> > is
> > >> already running Drill servers? I'm sure this can be done, but is it
> > easy?
> > >> Could we perhaps make this clearer as an explicit kind of thing?
> > >> * is there a way that when I deploy a UDF I can constrain the # of
> bits
> > it
> > >> is loaded into and perhaps even specify the bits?
> > >>  * Obvious correlarary is I'd want my query to run on those bits and a
> > >> not too disruptive way to restart just those bits
> > >>
> > >> The above may be obvious to Drill experts. If it is then perhaps the
> UDF
> > >> docs could just point out how to easily develop UDFs in an iterative
> > >> fashion.
> > >>
> > >> Keys
> > >> _______________________________
> > >> Keys Botzum
> > >> Senior Principal Technologist
> > >> kbot...@maprtech.com <mailto:kbot...@maprtech.com>
> > >> 443-718-0098
> > >> MapR Technologies
> > >> http://www.mapr.com <http://www.mapr.com/>
> > >>> On Jul 21, 2016, at 3:13 AM, Paul Rogers <prog...@maprtech.com>
> wrote:
> > >>>
> > >>> Always good to have options… Another is to try an eventual
> consistency
> > >> model.
> > >>>
> > >>> The invariant here is the one that was mentioned earlier. Whenever a
> > >> query is submitted with UDF U, that query either fails in planning
> > (because
> > >> U is unknown) or succeeds on all nodes (at least with respect to U.)
> > >>>
> > >>> For this to work, we need a constant view of the world. We can try to
> > >> enforce consistency at function registration time (the original
> > design), or
> > >> via the Foreman (Parth’s design.) We can probably also use an eventual
> > >> consistency model.
> > >>>
> > >>> Suppose we have a global name space of functions. With the global
> name
> > >> space, we can establish this invariant: If a function is in that name
> > >> space, then the Foreman accepts the query. If a Drillbit receives a
> > >> fragment, but does not yet know of U, then the Drillbit A) knows that
> > some
> > >> foreman must have registered U (or the query would have failed in
> > planning)
> > >> and B) the Drillbit can download the function if not already in place.
> > >>>
> > >>> Folks pointed out that always checking a global name space is
> > expensive,
> > >> which it is. As it turns out, we can first check the local function
> > >> registry. If the Drillbit already knows about the function, we’re done
> > >> checking, no global check needed. It is only on the first use of a new
> > >> function, when it is not yet loaded locally, that the global check
> must
> > be
> > >> done.
> > >>>
> > >>> For this to work the foreman that registers UDF U must:
> > >>>
> > >>> 1. From Arina’s proposed staging area, check the jar contents to see
> if
> > >> a name conflict exists with the global registry. (Requires some class
> > >> loader code.)
> > >>> 2. If a conflict exists, refuse to register the function and return
> an
> > >> error.
> > >>> 3. If no conflict exists, register the function in the global name
> > space
> > >> and move the jar to the registered area in DFS.
> > >>>
> > >>> In this model, it is entirely optional whether the foreman that
> > >> registers U alerts other Drillbits. Instead, Drillbits could poll from
> > time
> > >> to time, or just wait until they see a query with U and do the
> download
> > at
> > >> that time.
> > >>>
> > >>> When a new Drillbit starts, it can load all functions in the registry
> > >> area because these have all passed the name collision test and can all
> > be
> > >> used in queries. Any new registrations will be found and loaded as
> > above.
> > >> (It is not required to preload functions, but it might help
> > performance.)
> > >>>
> > >>> ZK is the only place we have at present for the global name space, so
> > >> that seems the logical tool. ZK allows atomic operations, which we
> need
> > >> here. Operations 1, 2, and 3 above should be atomic.
> > >>>
> > >>> Unfortunately, we can’t do the DFS move atomically with a ZK name
> space
> > >> insertion. So, the global name check & insert should be atomic. If
> that
> > >> succeeds, copy the jar into the registered folder. There are a few
> > details
> > >> to work out to handle special cases, but we can cover those another
> > time.
> > >> (Hint: what happens if the Foreman crashes after insetting the ZK
> entry
> > but
> > >> before moving the jar?)
> > >>>
> > >>> None of the proposed designs permit graceful unloading of functions.
> > So,
> > >> deleting functions will require a cluster restart to establish a new
> > stable
> > >> checkpoint.
> > >>>
> > >>> We can recommend that on each cluster restart, any functions in the
> DFS
> > >> registry be copied to each Drillbit (much easier with the coming YARN
> > >> integration) as a way of keeping the DFS registry a reasonable size.
> > >>>
> > >>> More details to work out, but that’s the gist of the concept.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> - Paul
> > >>>
> > >>>> On Jul 20, 2016, at 2:37 PM, Parth Chandra <pchan...@maprtech.com>
> > >> wrote:
> > >>>>
> > >>>> My notes from the hangout with Arina and Paul -
> > >>>>
> > >>>> Notes -
> > >>>>
> > >>>> There are two invariants for the registration process -
> > >>>> 1) There is a registration/validated directory in the DFS that
> > contains
> > >>>> UDFS that have been validated by the registering foreman. All
> > drillbits
> > >>>> will have access to this directory and on startup and/or UDF
> > >> registration,
> > >>>> the jars in this directory are sync'd up with a local UDF directory
> > >>>> 2) During the process of registration, the registering foreman
> > creates a
> > >>>> Zookeeper node that indicates that one or more drillbits has not yet
> > >>>> registered the UDF.
> > >>>>
> > >>>> The basic workflow is that UDF jars are copied from the staging
> > >> directory
> > >>>> to the registration directory and validated. Once they are
> validated,
> > >> the
> > >>>> available drillbits are told to register the UDF. Registering the
> UDF
> > >>>> consists of copying the node to a local UDF directory and updating
> the
> > >>>> local (in-memory) udf registry. A sentinel node in zookeeper is used
> > to
> > >>>> track when all the drillbits have registered the UDF.
> > >>>>
> > >>>> There were two main suggestions : Immediate registration and lazy
> > >>>> registration,
> > >>>>
> > >>>> Immediate registration -
> > >>>> Foreman tells all drillbits to register. Creates a Zookeeper node to
> > >>>> track.
> > >>>> Every drillbit makes a local copy and updates zookeeper node to show
> > it
> > >>>> is done.
> > >>>> Foreman checks the zookeeper node and when all available drillbits
> > have
> > >>>> acknowledged, sends a message to all drillbits to complete
> > registration.
> > >>>> Foreman removes ZK node.
> > >>>> All Drillbits update their local UDF registry
> > >>>> Drillbit startup will block if there is a ZK node indicating
> > >>>> registration is in progress.
> > >>>> This approach needs to be validated to see if any race conditions
> > >> exist.
> > >>>>
> > >>>> Lazy registration
> > >>>> Once a UDF is copied to the registration folder, the UDF is
> > essentially
> > >>>> registered. On first use, a drillbit may hit a classnotfound
> exception
> > >> in
> > >>>> which case it will look for the UDF in the registration directory.
> If
> > >>>> found, it will copy to the local directory and add the UDF to it's
> > local
> > >>>> registry.
> > >>>> This approach should be investigated to see if it fits in with the
> > >>>> current UDF execution code.
> > >>>>
> > >>>>
> > >>>> On Mon, Jul 18, 2016 at 3:36 PM, Parth Chandra <
> pchan...@maprtech.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> +1 on simplifying the design and postpone the items Paul has
> > suggested.
> > >>>>>
> > >>>>> Arina, Paul, I think we need to work out some of the design related
> > to
> > >>>>> registering the UDF. Are you guys open for a quick hangout @10 a.m
> > PDT
> > >>>>> tomorrow?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com
> >
> > >> wrote:
> > >>>>>
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> We’ve had quite a lively debate in the “comments” section of
> Arina’s
> > >>>>>> wonderful design doc. Zelaine made a great suggestion: summarize
> the
> > >> user
> > >>>>>> experience as a way of making sense of the wealth of detailed
> > >> comments.
> > >>>>>>
> > >>>>>> IMHO, the most important user experience goals are:
> > >>>>>>
> > >>>>>> 1. When a user submits a CREATE FUNCTION command, the command
> > returns
> > >>>>>> quickly (within a few seconds at most.)
> > >>>>>> 2. If the above user then issues a query using that function (to
> the
> > >> same
> > >>>>>> Foreman), that query is guaranteed to successfully use the new
> > >> function on
> > >>>>>> all nodes.
> > >>>>>> 3. Other users, connecting to any Foreman will see a very clean
> > >> behavior
> > >>>>>> when submitting a query with the new function. Before some point
> in
> > >> time
> > >>>>>> (can be different for each Foreman), a query with the function
> fails
> > >> in
> > >>>>>> planning. After that point, queries are guaranteed to successfully
> > >> use the
> > >>>>>> new function on all nodes.
> > >>>>>>
> > >>>>>> Basically, this says that CREATE FUNCTION can’t (potentially)
> take a
> > >> long
> > >>>>>> time. Use of functions can’t result in random failures during the
> > >> time that
> > >>>>>> the function is propagated across Drillbits.
> > >>>>>>
> > >>>>>> The goals we can perhaps postpone are:
> > >>>>>>
> > >>>>>> 1. Class name space isolation. (Allows two data scientists to
> define
> > >> the
> > >>>>>> same class without collisions.)
> > >>>>>> 2. Function name spaces. (Allows me to define “paul.foo” and you
> to
> > >>>>>> define “bob.foo” with out collisions. (Needed if many people
> develop
> > >>>>>> functions independently. Else, we need a global name space.)
> > >>>>>> 3. Dynamic DROP FUNCTION operation. (The issues here are messy,
> and
> > it
> > >>>>>> requires unloading classes and name space cleanup.) (Just let the
> > >> cleanup
> > >>>>>> happen offline.)
> > >>>>>> 4. Dependency jars (e.g. third party libraries, etc.) (We require
> > >> those
> > >>>>>> to be statically added to the class path before Drill starts.)
> > >>>>>>
> > >>>>>> We are not creating per-user name spaces, or allowing people to
> use
> > >>>>>> production clusters to try/revise functions. We’re just sampling
> > >> deployment
> > >>>>>> of simple functions.
> > >>>>>>
> > >>>>>> That’s my suggestion, what do others suggest?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> - Paul
> > >>>>>>
> > >>>>>>> On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva <
> > >>>>>> arina.yelchiy...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> I also agree on using Zookeeper. I have re-worked dynamic UDF
> > support
> > >>>>>>> document taking into account Zookeeper usage.
> > >>>>>>>
> > >>>>>>> Link to the document -
> > >>>>>>>
> > >>>>>>
> > >>
> >
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> > >>>>>>>
> > >>>>>>> Kind regards
> > >>>>>>> Arina
> > >>>>>>>
> > >>>>>>> On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <
> prog...@maprtech.com
> > >
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Great idea! We already use ZK to track storage plugins. ZK is
> > >> perhaps
> > >>>>>>>> better suited to register each jar and/or function that using
> > files
> > >> in
> > >>>>>> DFS.
> > >>>>>>>> Still need to work out the proper sequencing. But you are right,
> > >> this
> > >>>>>> is
> > >>>>>>>> the kind of thing that ZK is supposed to solve.
> > >>>>>>>>
> > >>>>>>>> - Paul
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org>
> > >> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Reading thru some of Paul's comments on maintaining a
> consistent
> > >> state
> > >>>>>>>> for
> > >>>>>>>>> the registration of the UDF, it looks like we need a consensus
> > >>>>>> protocol
> > >>>>>>>> for
> > >>>>>>>>> determining that all the Drillbits have the UDF deployed.
> > >>>>>>>>> I believe Zookeeper can provide a stronger guarantee than a 2
> > phase
> > >>>>>>>>> approach. Should we look into that?
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva <
> > >>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi all!
> > >>>>>>>>>>
> > >>>>>>>>>> I have updated design document.
> > >>>>>>>>>> Main changes:
> > >>>>>>>>>> 1. Add to Drill’s config цшер  the staging and registration
> DFS
> > >>>>>>>> locations.
> > >>>>>>>>>> 2. User is no longer is responsible for copying jars into
> > drillbit
> > >>>>>>>> nodes.
> > >>>>>>>>>> Now user needs to copy jars into staging DFS location from
> where
> > >>>>>>>> drillbits
> > >>>>>>>>>> will copy them to local fs.
> > >>>>>>>>>> 2. During UDFs registration jars will be moved to DFS
> > registration
> > >>>>>> area.
> > >>>>>>>>>> 3. During start up drillbit will copy all jars from
> registration
> > >>>>>> area,
> > >>>>>>>> so
> > >>>>>>>>>> newly added drillbit will have all UDFs as others.
> > >>>>>>>>>> 4. Security issues - probably they will be added later as
> > >>>>>> enhancement.
> > >>>>>>>>>>
> > >>>>>>>>>> More detains in the document:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>
> >
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> > >>>>>>>>>>
> > >>>>>>>>>> Kind regards
> > >>>>>>>>>> Arina
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers <
> > prog...@maprtech.com
> > >>>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi All,
> > >>>>>>>>>>>
> > >>>>>>>>>>> To answer Arina on item 3: there is actually no good location
> > on
> > >> any
> > >>>>>>>>>> local
> > >>>>>>>>>>> node to put the UDFs. Reason: DoY allows the admin to start a
> > >>>>>> Drillbit
> > >>>>>>>> on
> > >>>>>>>>>>> any available node. When it starts, a new, fresh copy of
> Drill
> > >> will
> > >>>>>> be
> > >>>>>>>>>>> downloaded, and this can happen after the user issued the
> > CREATE
> > >>>>>>>> command.
> > >>>>>>>>>>>
> > >>>>>>>>>>> What we need is a shared, secure distributed storage location
> > >> from
> > >>>>>>>> which
> > >>>>>>>>>>> Drillbits can download the needed jar files. Something like…
> > DFS!
> > >>>>>>>> Indeed,
> > >>>>>>>>>>> this is how YARN stores the Drill archive from which it
> creates
> > >> the
> > >>>>>>>> Drill
> > >>>>>>>>>>> install directory on each node. We can’t quite use YARN’s
> > >> mechanism
> > >>>>>>>> (YARN
> > >>>>>>>>>>> is aware only of the files uploaded when launching an app),
> but
> > >> we
> > >>>>>> can
> > >>>>>>>> do
> > >>>>>>>>>>> something similar.
> > >>>>>>>>>>>
> > >>>>>>>>>>> So, brainstorming a bit…
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. Store the UDF jar in a pre-defined DFS location.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2. The CREATE function 1) uploads the jar to the DFS
> location,
> > >> and
> > >>>>>> 2)
> > >>>>>>>>>>> creates some kind of registry entry.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 3. The DELETE function 1) deregisters the jar (and function),
> > >> but 2)
> > >>>>>>>> does
> > >>>>>>>>>>> not delete the jar (this allows in-flight queries to
> complete.)
> > >>>>>>>>>>>
> > >>>>>>>>>>> 3. Drillbits periodically check DFS for changed
> registrations,
> > >>>>>>>>>> downloading
> > >>>>>>>>>>> any needed jars. (YARN, Spark, Storm and others already do
> > >> something
> > >>>>>>>>>>> similar.)
> > >>>>>>>>>>>
> > >>>>>>>>>>> 4. Registry check is “forced” when processing a query with a
> > >>>>>> function
> > >>>>>>>>>> that
> > >>>>>>>>>>> is not currently registered. (Doing so resolves any possible
> > race
> > >>>>>>>>>>> conditions.)
> > >>>>>>>>>>>
> > >>>>>>>>>>> 5. Some process (perhaps time based) removes old,
> unregistered
> > >> jar
> > >>>>>>>> files.
> > >>>>>>>>>>> (Or, we could get fancy and use reference counts. The
> reference
> > >>>>>> count
> > >>>>>>>>>> would
> > >>>>>>>>>>> be required if the user wants to delete, then recreate, the
> > same
> > >>>>>>>> function
> > >>>>>>>>>>> and jar to avoid conflict with in-flight queries.)
> > >>>>>>>>>>>
> > >>>>>>>>>>> We can build security on this as follows:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. Define permissions for who can write to the DFS location.
> > Or,
> > >>>>>>>> indeed,
> > >>>>>>>>>>> have subdirectories by user and grant each user permission
> only
> > >> on
> > >>>>>>>> their
> > >>>>>>>>>>> own UDF directory.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2. Provide separate registries for per-user functions
> (private)
> > >> and
> > >>>>>>>>>> global
> > >>>>>>>>>>> functions (public). Only the admin can add global functions.
> > But,
> > >>>>>> only
> > >>>>>>>>>> the
> > >>>>>>>>>>> user that uploads a private function can use it.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their
> own
> > >> name
> > >>>>>>>> space
> > >>>>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call
> > >> into a
> > >>>>>>>> UDF,
> > >>>>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow
> Drill
> > >>>>>> classes
> > >>>>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t
> > clash
> > >>>>>> with
> > >>>>>>>>>> your
> > >>>>>>>>>>> function Foo if both are private.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Sorry that this has wandered a bit far from the original
> simple
> > >>>>>> design,
> > >>>>>>>>>>> but the above may capture much of what folks expect in modern
> > >>>>>>>> distributed
> > >>>>>>>>>>> big data systems.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I wonder if a good next step might be to review the notes in
> > the
> > >>>>>> design
> > >>>>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a
> > >> summary
> > >>>>>> of
> > >>>>>>>>>>> technical requirements, and a proposed design. Postpone, at
> > least
> > >>>>>> for
> > >>>>>>>>>> now,
> > >>>>>>>>>>> concerns about the amount of work; we can worry about that
> once
> > >>>>>> folks
> > >>>>>>>>>> agree
> > >>>>>>>>>>> on your revised design.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>>
> > >>>>>>>>>>> - Paul
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva <
> > >>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 4. Authorization model mentioned by Julia and John
> > >>>>>>>>>>>> If user won't have rights to copy jars to UDF classpath,
> which
> > >> can
> > >>>>>> be
> > >>>>>>>>>>>> restricted by file system, he won't be able to do much harm
> by
> > >>>>>> running
> > >>>>>>>>>>>> CREATE command. If UDFs from jar were already registered,
> > CREATE
> > >>>>>>>>>>> statement
> > >>>>>>>>>>>> will fail. CREATE OR REPLACE will just re-register UDFs.
> > >>>>>>>>>>>> But DELETE command is not safe. If user knows jar name, he
> can
> > >>>>>> delete
> > >>>>>>>>>> all
> > >>>>>>>>>>>> associated with it UDFs, as well as the binary and source
> > jars.
> > >>>>>> That's
> > >>>>>>>>>>>> where we'll probably need to impose restrictions.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva <
> > >>>>>>>>>>> arina.yelchiy...@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> 1. DELETE command - I missed to indicate it document but
> had
> > it
> > >>>>>> in my
> > >>>>>>>>>>>>> mind. When user issues DELETE command, all UDF associated
> > with
> > >>>>>>>>>> indicated
> > >>>>>>>>>>>>> jar is removed from DrillFunctionRegistry. And then binary
> > and
> > >>>>>> source
> > >>>>>>>>>>>>> files are also deleted from UDF classpath.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2. Distribution race condition described by Paul
> > >>>>>>>>>>>>> User issues CREATE command and gets confirmation that UDFs
> is
> > >>>>>>>>>> registered
> > >>>>>>>>>>>>> only if all drilllbits have confirmed that registration was
> > >>>>>>>>>> successful.
> > >>>>>>>>>>>>> I don't expect user to start using UDFs in queries prior to
> > >> CREATE
> > >>>>>>>>>>> command
> > >>>>>>>>>>>>> success / failure result, which is possible but strange.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 3. DoY
> > >>>>>>>>>>>>> @Paul
> > >>>>>>>>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly
> we
> > >> use
> > >>>>>>>>>>>>> $DRILL_UDF environment variable which will be set during
> > >> drillbit
> > >>>>>>>>>> start
> > >>>>>>>>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable
> will
> > be
> > >>>>>> added
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> Drill classpath during start.
> > >>>>>>>>>>>>> Will it ease DoY integration somehow?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Kind regards
> > >>>>>>>>>>>>> Arina
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman
> > >>>>>>>>>>> <yufeld...@yahoo.com.invalid>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Just thoughts:
> > >>>>>>>>>>>>>> You can try to reuse distributed cache Let Drill AM do the
> > >>>>>> needful
> > >>>>>>>> in
> > >>>>>>>>>>>>>> terms of orchestrating UDF jars distribution.
> > >>>>>>>>>>>>>> But
> > >>>>>>>>>>>>>> I would be inclined to have a common path that is
> > independent
> > >> of
> > >>>>>> the
> > >>>>>>>>>>> fact
> > >>>>>>>>>>>>>> that it is Drill on YARN or not, as maintaining two
> separate
> > >>>>>> ways of
> > >>>>>>>>>>>>>> dealing with loading/unloading UDFs will be painful and
> > error
> > >>>>>> prone.
> > >>>>>>>>>>>>>> One more note (I left a comment in the doc) - not sure
> about
> > >>>>>>>>>>>>>> authorization model here - we need to have some.
> > >>>>>>>>>>>>>> Just my 2cThanks
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> From: Paul Rogers <prog...@maprtech.com>
> > >>>>>>>>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org>
> > >>>>>>>>>>>>>> Sent: Monday, June 20, 2016 7:32 PM
> > >>>>>>>>>>>>>> Subject: Re: Dynamic UDFs support
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Neeraja,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The proposal calls for the user to copy the jar file to
> each
> > >>>>>>>> Drillbit
> > >>>>>>>>>>>>>> node. The jar would go into a new
> > >> $DRILL_HOME/jars/3rdparty/udf
> > >>>>>>>>>>> directory.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying
> > Drill
> > >>>>>> code
> > >>>>>>>> to
> > >>>>>>>>>>>>>> each node (which is good.) YARN puts that code in a
> location
> > >>>>>> known
> > >>>>>>>>>>> only to
> > >>>>>>>>>>>>>> YARN. Since the location is private to YARN, the user
> can’t
> > >>>>>> easily
> > >>>>>>>>>> hunt
> > >>>>>>>>>>>>>> down the location in order to add the udf jar. Even if the
> > >> user
> > >>>>>> did
> > >>>>>>>>>>> find
> > >>>>>>>>>>>>>> the location, the next Drillbit to start would create a
> new
> > >> copy
> > >>>>>> of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>> Drill software, without the udf jar.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Second, in DoY we have separated user files from Drill
> > >> software.
> > >>>>>>>> This
> > >>>>>>>>>>>>>> makes it much easier to distribute the software to each
> > node:
> > >> we
> > >>>>>>>> give
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> Drill distribution tar archive to YARN, and YARN copies it
> > to
> > >>>>>> each
> > >>>>>>>>>>> node and
> > >>>>>>>>>>>>>> untars the Drill files. We make a separate copy of the
> (far
> > >>>>>> smaller)
> > >>>>>>>>>>> set of
> > >>>>>>>>>>>>>> user config files.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> If the udf jar goes into a Drill folder
> > >>>>>>>>>>> ($DRILL_HOME/jars/3rdparty/udf),
> > >>>>>>>>>>>>>> then the user would have to rebuild the Drill tar file
> each
> > >> time
> > >>>>>>>> they
> > >>>>>>>>>>> add a
> > >>>>>>>>>>>>>> udf jar. When I tried this myself when building DoY, I
> found
> > >> it
> > >>>>>> to
> > >>>>>>>> be
> > >>>>>>>>>>> slow
> > >>>>>>>>>>>>>> and error-prone.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> So, the solution is to place the udf code in the new
> “site”
> > >>>>>>>>>> directory:
> > >>>>>>>>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY
> > >>>>>>>>>> automatically
> > >>>>>>>>>>>>>> distribute the code to every node. Perfect! Except that it
> > >> does
> > >>>>>> not
> > >>>>>>>>>>> work to
> > >>>>>>>>>>>>>> dynamically distribute code after Drill starts.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> For DoY, the solution requirements are:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 1. Distribute code using Drill itself, rather than
> manually
> > >>>>>> copying
> > >>>>>>>>>>> jars
> > >>>>>>>>>>>>>> to (unknown) Drill directories.
> > >>>>>>>>>>>>>> 2. Ensure the solution works even if another Drillbit is
> > spun
> > >> up
> > >>>>>>>>>> later,
> > >>>>>>>>>>>>>> and uses the original Drill tar file.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I’m thinking we want to leverage DFS: place udf files
> into a
> > >>>>>>>>>> well-known
> > >>>>>>>>>>>>>> DFS directory. Register the udf into, say, ZK. When a new
> > >>>>>> Drillbit
> > >>>>>>>>>>> starts,
> > >>>>>>>>>>>>>> it looks for new udf jars in ZK, copies the file to a
> > >> temporary
> > >>>>>>>>>>> location,
> > >>>>>>>>>>>>>> and launches. An existing Drill is notified of the change
> > and
> > >>>>>> does
> > >>>>>>>>>> the
> > >>>>>>>>>>> same
> > >>>>>>>>>>>>>> download process. Clean-up is needed at some point to
> remove
> > >> ZK
> > >>>>>>>>>>> entries if
> > >>>>>>>>>>>>>> the udf jar becomes statically available on the next
> launch.
> > >> That
> > >>>>>>>>>> needs
> > >>>>>>>>>>>>>> more thought.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> We’d still need the phases mentioned earlier to ensure
> > >>>>>> consistency.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Suggestions anyone as to how to do this super simply &
> still
> > >> get
> > >>>>>> it
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>> work with DoY?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> - Paul
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala <
> > >>>>>>>>>>>>>> nrentachint...@maprtech.com> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> This will need to work with YARN (Once Drill is YARN
> > >> enabled, I
> > >>>>>>>>>> would
> > >>>>>>>>>>>>>>> expect a lot of users using it in conjunction with YARN).
> > >>>>>>>>>>>>>>> Paul, I am not clear why this wouldn't work with YARN.
> Can
> > >> you
> > >>>>>>>>>>>>>> elaborate.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> -Neeraja
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <
> > >>>>>> prog...@maprtech.com
> > >>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Good enough, as long as we document the limitation that
> > this
> > >>>>>>>>>> feature
> > >>>>>>>>>>>>>> can’t
> > >>>>>>>>>>>>>>>> work with YARN deployment as users generally do not have
> > >>>>>> access to
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> temporary “localization” directories where the Drill
> code
> > is
> > >>>>>>>> placed
> > >>>>>>>>>>> by
> > >>>>>>>>>>>>>> YARN.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Note that the jar distribution race condition issue
> occurs
> > >> with
> > >>>>>>>> the
> > >>>>>>>>>>>>>>>> proposed design: I believe I sketched out a scenario in
> > one
> > >> of
> > >>>>>> the
> > >>>>>>>>>>>>>> earlier
> > >>>>>>>>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION
> command.
> > >> It
> > >>>>>>>> tells
> > >>>>>>>>>>>>>>>> Drillbit B. While informing the other Drillbits,
> Drillbit
> > B
> > >>>>>> plans
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>>>> launches a query that uses the function. Drillbit Z
> starts
> > >>>>>>>>>> execution
> > >>>>>>>>>>>>>> of the
> > >>>>>>>>>>>>>>>> query before it learns from A about the new function.
> This
> > >>>>>> will be
> > >>>>>>>>>>>>>> rare —
> > >>>>>>>>>>>>>>>> just rare enough to create very hard to reproduce bugs.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> The only reliable solution is to do the work in multiple
> > >>>>>> passes:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Pass 1: Ask each node to load the function, but not make
> > it
> > >>>>>>>>>> available
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> the planner. (it would be available to the execution
> > >> engine.)
> > >>>>>>>>>>>>>>>> Pass 2: Await confirmation from each node that this is
> > done.
> > >>>>>>>>>>>>>>>> Pass 3: Alert every node that it is now free to plan
> > queries
> > >>>>>> with
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>>> function.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Finally, I wonder if we should design the SQL syntax
> based
> > >> on a
> > >>>>>>>>>>>>>> long-term
> > >>>>>>>>>>>>>>>> design, even if the feature itself is a short-term
> > >> work-around.
> > >>>>>>>>>>>>>> Changing
> > >>>>>>>>>>>>>>>> the syntax later might break scripts that users might
> > write.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> So, the question for the group is this: is the value of
> > >>>>>>>>>> semi-complete
> > >>>>>>>>>>>>>>>> feature sufficient to justify the potential problems?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - Paul
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra <
> > >>>>>>>> pchan...@maprtech.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Moving discussion to dev.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I believe the aim is to do a simple implementation
> > without
> > >> the
> > >>>>>>>>>>>>>> complexity
> > >>>>>>>>>>>>>>>>> of distributing the UDF. I think the document should
> make
> > >> this
> > >>>>>>>>>>>>>> limitation
> > >>>>>>>>>>>>>>>>> clear.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Per Paul's point on there being a simpler solution of
> > just
> > >>>>>> having
> > >>>>>>>>>>> each
> > >>>>>>>>>>>>>>>>> drillbit detect the if a UDF is present, I think the
> > >> problem
> > >>>>>> is
> > >>>>>>>>>> if a
> > >>>>>>>>>>>>>> UDF
> > >>>>>>>>>>>>>>>>> get's deployed to some but not all drillbits. A query
> can
> > >> then
> > >>>>>>>>>> start
> > >>>>>>>>>>>>>>>>> executing but not run successfully. The intent of the
> > >> create
> > >>>>>>>>>>> commands
> > >>>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>> be to ensure that all drillbits have the UDF or none
> > would.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I think Jacques' point about ownership conflicts is not
> > >>>>>> addressed
> > >>>>>>>>>>>>>>>> clearly.
> > >>>>>>>>>>>>>>>>> Also, the unloading is not clear. The delete command
> > should
> > >>>>>>>>>> probably
> > >>>>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>> the UDF and unload it.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <
> > >>>>>>>>>> prog...@maprtech.com
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary
> > >>>>>> comments
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> community to consider.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN
> project.
> > Is
> > >>>>>> this
> > >>>>>>>> a
> > >>>>>>>>>>>>>>>> specific
> > >>>>>>>>>>>>>>>>>> fix for one unique problem, or is it worth expanding
> the
> > >>>>>>>> solution
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> work
> > >>>>>>>>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make
> > the
> > >> two
> > >>>>>>>>>> work
> > >>>>>>>>>>>>>>>> together
> > >>>>>>>>>>>>>>>>>> later. See comments in docs for details.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 2. Have we, by chance, looked at how other projects
> > handle
> > >>>>>> code
> > >>>>>>>>>>>>>>>>>> distribution? Spark, Storm and others automatically
> > deploy
> > >>>>>> code
> > >>>>>>>>>>>>>> across
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> cluster; no manual distribution to each node. The key
> > >>>>>> difference
> > >>>>>>>>>>>>>> between
> > >>>>>>>>>>>>>>>>>> Drill and others is that, for Storm, say, code is
> > >> associated
> > >>>>>>>>>> with a
> > >>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions
> > are
> > >>>>>> global
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>> no obvious life cycle that suggests when the code can
> be
> > >>>>>>>>>> unloaded.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 3. Have considered the class loader, dependency and
> name
> > >>>>>> space
> > >>>>>>>>>>>>>> isolation
> > >>>>>>>>>>>>>>>>>> issues addressed by such products as Tomcat (web apps)
> > or
> > >>>>>>>> Eclipse
> > >>>>>>>>>>>>>>>>>> (plugins)? Putting user code in the same namespace as
> > >> Drill
> > >>>>>> code
> > >>>>>>>>>>> is
> > >>>>>>>>>>>>>>>> quick
> > >>>>>>>>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to
> > >>>>>> problems
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> require long, frustrating debugging sessions to
> resolve.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing
> > >> items
> > >>>>>> 2
> > >>>>>>>>>> and
> > >>>>>>>>>>> 3
> > >>>>>>>>>>>>>>>> are a
> > >>>>>>>>>>>>>>>>>> big increase in scope, so I won’t be surprised if we
> > leave
> > >>>>>> those
> > >>>>>>>>>>>>>> issues
> > >>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>> later. (Though, addressing item 2 might be the best
> way
> > to
> > >>>>>>>>>> address
> > >>>>>>>>>>>>>> item
> > >>>>>>>>>>>>>>>> 1.)
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> If we want a very simple solution that requires
> minimal
> > >>>>>> change,
> > >>>>>>>>>>>>>> perhaps
> > >>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>> can use an even simpler solution. In the proposed
> > design,
> > >> the
> > >>>>>>>>>> user
> > >>>>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>> must distribute code to all the nodes. The primary
> > change
> > >> is
> > >>>>>> to
> > >>>>>>>>>>> tell
> > >>>>>>>>>>>>>>>> Drill
> > >>>>>>>>>>>>>>>>>> to load (or unload) that code. Can accomplish the same
> > >> result
> > >>>>>>>>>>> easier
> > >>>>>>>>>>>>>>>> simply
> > >>>>>>>>>>>>>>>>>> by having Drill periodically scan certain directories
> > >> looking
> > >>>>>>>> for
> > >>>>>>>>>>> new
> > >>>>>>>>>>>>>>>> (or
> > >>>>>>>>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve
> the
> > >> name
> > >>>>>>>>>> space
> > >>>>>>>>>>>>>>>> issues,
> > >>>>>>>>>>>>>>>>>> but will work for existing non-YARN Drill users
> without
> > >> new
> > >>>>>> SQL
> > >>>>>>>>>>>>>> syntax.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> - Paul
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <
> > >>>>>>>> jacq...@dremio.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Two quick thoughts:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> - (user) In the design document I didn't see any
> > >> discussion
> > >>>>>> of
> > >>>>>>>>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to
> > see
> > >>>>>> the
> > >>>>>>>>>>>>>> thinking
> > >>>>>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>>>> - (dev) There is a row oriented facade via the
> > >>>>>>>>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That
> > would
> > >>>>>> be a
> > >>>>>>>>>>> good
> > >>>>>>>>>>>>>>>> place
> > >>>>>>>>>>>>>>>>>>> to start when trying to implement an alternative
> > >> interface.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>> Jacques Nadeau
> > >>>>>>>>>>>>>>>>>>> CTO and Co-Founder, Dremio
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <
> > >>>>>>>>>> j...@omernik.com>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I
> think
> > >> some
> > >>>>>> of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>>> ideas
> > >>>>>>>>>>>>>>>>>>>> around community java UDFs could be a better
> approach.
> > >> I'd
> > >>>>>>>> hate
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> take
> > >>>>>>>>>>>>>>>>>>>> away from other work to hack in something like this.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <
> > >>>>>>>>>>> prog...@maprtech.com
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Ted refers to source code transformation. Drill
> gains
> > >> its
> > >>>>>>>>>> speed
> > >>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>>>> value
> > >>>>>>>>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the
> > row-based
> > >>>>>>>>>> interface
> > >>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>> most
> > >>>>>>>>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are
> > >> very
> > >>>>>> type
> > >>>>>>>>>>>>>>>> specific,
> > >>>>>>>>>>>>>>>>>>>>> code is typically generated to handle the specifics
> > of
> > >>>>>> each
> > >>>>>>>>>>> type.
> > >>>>>>>>>>>>>>>>>>>> Accessing
> > >>>>>>>>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because
> of
> > >> the
> > >>>>>>>>>>>>>> "impedence
> > >>>>>>>>>>>>>>>>>>>>> mismatch" between how VVs work and the
> row-and-column
> > >> view
> > >>>>>>>>>>>>>> expected
> > >>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>> most
> > >>>>>>>>>>>>>>>>>>>>> (non-Drill) developers.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I wonder if we've considered providing a
> row-oriented
> > >>>>>>>> "facade"
> > >>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>> used by roll-your own data sources and user-defined
> > row
> > >>>>>>>>>>>>>> transforms?
> > >>>>>>>>>>>>>>>>>> Might
> > >>>>>>>>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be
> > handy
> > >>>>>> for
> > >>>>>>>>>>> users
> > >>>>>>>>>>>>>>>>>> willing
> > >>>>>>>>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such
> a
> > >>>>>> facade,
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>>> Jython
> > >>>>>>>>>>>>>>>>>>>> row
> > >>>>>>>>>>>>>>>>>>>>> transforms that John mentions could be quite
> simple.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <
> > >>>>>>>>>>>>>> ted.dunn...@gmail.com
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Since UDF's use source code transformation, using
> > >> Jython
> > >>>>>>>>>> would
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>> difficult.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva
> <
> > >>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Hi Charles,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't
> > >> invent
> > >>>>>>>>>>>>>> anything
> > >>>>>>>>>>>>>>>>>>>> new,
> > >>>>>>>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit
> > >> restart.
> > >>>>>> But
> > >>>>>>>>>>>>>>>>>>>>> contributions
> > >>>>>>>>>>>>>>>>>>>>>>> are welcomed.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <
> > >>>>>>>>>>> cgi...@gmail.com
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Arina,
> > >>>>>>>>>>>>>>>>>>>>>>>> Has there been any discussion about making it
> > >> possible
> > >>>>>> via
> > >>>>>>>>>>>>>> Jython
> > >>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>> something for users to write simple UDFs in
> > Python?
> > >>>>>>>>>>>>>>>>>>>>>>>> My ideal would be to have this capability
> > >> integrated in
> > >>>>>>>> the
> > >>>>>>>>>>> web
> > >>>>>>>>>>>>>>>> GUI
> > >>>>>>>>>>>>>>>>>>>>>> such
> > >>>>>>>>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python)
> > right
> > >>>>>> there,
> > >>>>>>>>>>>>>> submit
> > >>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes
> validation
> > >>>>>> tests.
> > >>>>>>>>>>>>>>>>>>>>>>>> —C
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
> > >>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs
> support
> > >> in
> > >>>>>>>>>> Drill (
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> https://issues.apache.org/jira/browse/DRILL-4726
> > ).
> > >>>>>> There
> > >>>>>>>>>>> is a
> > >>>>>>>>>>>>>>>>>>>> link
> > >>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>> design document in Jira description.
> > >>>>>>>>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards
> > >>>>>>>>>>>>>>>>>>>>>>>>> Arina
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> >
> >
> >
>

Re: Dynamic UDFs support

Reply via email to