Re: Dynamic UDFs support

Paul Rogers Thu, 21 Jul 2016 10:53:09 -0700

Hi All,

By the way, the class loading/unloading issue, while new to us Drillers, is a 
well-known technology in projects such as Tomcat which loads & unloads web 
apps, each with their own class name space (implemented by a class loader.)


To make the dynamic UDF function fully usable, we’d need to solve the class 
name space, function name space, and dynamic class loading issues. Ideally, if 
I create a function “foo” and you create a function “foo”, they don’t conflict 
unless we make them global. We could do “paul_foo” and “zelaine_foo”, but that 
gets kind of awkward.

However, implementing function name spaces and custom class loaders was thought 
to be out of scope for what started as a minor feature, so we looked for 
solutions that work without them. We can squeak by in adding dynamic UDFs 
without (much) class loader work, but I suspect we’d need to go all-in if we 
want to drop and replace functions as well.

As usual, it all comes down to cost. Solutions are available, the question for 
the community is whether we want dynamic add now, or wait for the whole 
enchilada later.

Suggestions?

Thanks,

- Paul

> On Jul 21, 2016, at 8:35 AM, Zelaine Fong <zf...@maprtech.com> wrote:
> 
> Neeraja,
> 
> Can you clarify your "requirements", specifically
> 
> So any design that requires restarts I would think would beat the purpose.
> 
> Earlier in the thread, Paul had suggested deferring dynamic DROP FUNCTION
> from v1 because of the complexity of class unloading.  However, Keys notes
> that if that is not addressed, when a developer is doing active UDF
> development, they will have to restart the dev cluster if they need to
> replace the UDF during their dev cycle.
> 
> -- Zelaine
> 
> On Thu, Jul 21, 2016 at 7:21 AM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
> 
>> The whole point of this feature is to avoid Drill cluster restarts as the
>> name indicates 'Dynamic' UDFs.
>> So any design that requires restarts I would think would beat the purpose.
>> 
>> I also think this is an example of a feature we start with a simple design
>> to serve the purpose, take feedback on how it is being deployed/used in
>> real user situations and improve it in subsequent releases.
>> 
>> -thanks
>> Neeraja
>> 
>> On Thu, Jul 21, 2016 at 6:32 AM, Keys Botzum <kbot...@maprtech.com> wrote:
>> 
>>> I think there are a lot of great ideas here. My one concern is the lack
>> of
>>> unload and thus presumably replace functionality. I'm just thinking about
>>> typical actual usage.
>>> 
>>> In a typical development cycle someone writes something, tries it,
>> learns,
>>> changes it, and tries again. Assuming I understand the design that change
>>> step requires a full Drill cluster restart. That is going to be very
>>> disruptive and will make UDF work nearly impossible without a dedicated
>>> "private" cluster for Drill. I realize that people should have access to
>>> the data they need and Drill in a development cluster but even then
>>> restarts can be hard since development clusters are often shared - and
>>> that's assuming such a cluster exists. I realize of course Drill can be
>> run
>>> as a standalone Drillbit but I'm not convinced that desktops will have
>>> adequate access to the needed data.
>>> 
>>> Having dealt with Java classloading over the years, I'm not claiming
>> class
>>> replacement is an easy thing so I'll defer to others on the priority of
>>> that, but I'm wondering if there isn't some way to make UDF
>> experimentation
>>> a bit easier/practical.
>>> 
>>> Given the above, let me toss out some possibly naive ideas that maybe are
>>> workable:
>>> * can I easily run a standalone Drillbit on a Hadoop cluster node that is
>>> already running Drill servers? I'm sure this can be done, but is it easy?
>>> Could we perhaps make this clearer as an explicit kind of thing?
>>> * is there a way that when I deploy a UDF I can constrain the # of bits
>> it
>>> is loaded into and perhaps even specify the bits?
>>>  * Obvious correlarary is I'd want my query to run on those bits and a
>>> not too disruptive way to restart just those bits
>>> 
>>> The above may be obvious to Drill experts. If it is then perhaps the UDF
>>> docs could just point out how to easily develop UDFs in an iterative
>>> fashion.
>>> 
>>> Keys
>>> _______________________________
>>> Keys Botzum
>>> Senior Principal Technologist
>>> kbot...@maprtech.com <mailto:kbot...@maprtech.com>
>>> 443-718-0098
>>> MapR Technologies
>>> http://www.mapr.com <http://www.mapr.com/>
>>>> On Jul 21, 2016, at 3:13 AM, Paul Rogers <prog...@maprtech.com> wrote:
>>>> 
>>>> Always good to have options… Another is to try an eventual consistency
>>> model.
>>>> 
>>>> The invariant here is the one that was mentioned earlier. Whenever a
>>> query is submitted with UDF U, that query either fails in planning
>> (because
>>> U is unknown) or succeeds on all nodes (at least with respect to U.)
>>>> 
>>>> For this to work, we need a constant view of the world. We can try to
>>> enforce consistency at function registration time (the original design),
>> or
>>> via the Foreman (Parth’s design.) We can probably also use an eventual
>>> consistency model.
>>>> 
>>>> Suppose we have a global name space of functions. With the global name
>>> space, we can establish this invariant: If a function is in that name
>>> space, then the Foreman accepts the query. If a Drillbit receives a
>>> fragment, but does not yet know of U, then the Drillbit A) knows that
>> some
>>> foreman must have registered U (or the query would have failed in
>> planning)
>>> and B) the Drillbit can download the function if not already in place.
>>>> 
>>>> Folks pointed out that always checking a global name space is
>> expensive,
>>> which it is. As it turns out, we can first check the local function
>>> registry. If the Drillbit already knows about the function, we’re done
>>> checking, no global check needed. It is only on the first use of a new
>>> function, when it is not yet loaded locally, that the global check must
>> be
>>> done.
>>>> 
>>>> For this to work the foreman that registers UDF U must:
>>>> 
>>>> 1. From Arina’s proposed staging area, check the jar contents to see if
>>> a name conflict exists with the global registry. (Requires some class
>>> loader code.)
>>>> 2. If a conflict exists, refuse to register the function and return an
>>> error.
>>>> 3. If no conflict exists, register the function in the global name
>> space
>>> and move the jar to the registered area in DFS.
>>>> 
>>>> In this model, it is entirely optional whether the foreman that
>>> registers U alerts other Drillbits. Instead, Drillbits could poll from
>> time
>>> to time, or just wait until they see a query with U and do the download
>> at
>>> that time.
>>>> 
>>>> When a new Drillbit starts, it can load all functions in the registry
>>> area because these have all passed the name collision test and can all be
>>> used in queries. Any new registrations will be found and loaded as above.
>>> (It is not required to preload functions, but it might help performance.)
>>>> 
>>>> ZK is the only place we have at present for the global name space, so
>>> that seems the logical tool. ZK allows atomic operations, which we need
>>> here. Operations 1, 2, and 3 above should be atomic.
>>>> 
>>>> Unfortunately, we can’t do the DFS move atomically with a ZK name space
>>> insertion. So, the global name check & insert should be atomic. If that
>>> succeeds, copy the jar into the registered folder. There are a few
>> details
>>> to work out to handle special cases, but we can cover those another time.
>>> (Hint: what happens if the Foreman crashes after insetting the ZK entry
>> but
>>> before moving the jar?)
>>>> 
>>>> None of the proposed designs permit graceful unloading of functions.
>> So,
>>> deleting functions will require a cluster restart to establish a new
>> stable
>>> checkpoint.
>>>> 
>>>> We can recommend that on each cluster restart, any functions in the DFS
>>> registry be copied to each Drillbit (much easier with the coming YARN
>>> integration) as a way of keeping the DFS registry a reasonable size.
>>>> 
>>>> More details to work out, but that’s the gist of the concept.
>>>> 
>>>> Thanks,
>>>> 
>>>> - Paul
>>>> 
>>>>> On Jul 20, 2016, at 2:37 PM, Parth Chandra <pchan...@maprtech.com>
>>> wrote:
>>>>> 
>>>>> My notes from the hangout with Arina and Paul -
>>>>> 
>>>>> Notes -
>>>>> 
>>>>> There are two invariants for the registration process -
>>>>> 1) There is a registration/validated directory in the DFS that
>> contains
>>>>> UDFS that have been validated by the registering foreman. All
>> drillbits
>>>>> will have access to this directory and on startup and/or UDF
>>> registration,
>>>>> the jars in this directory are sync'd up with a local UDF directory
>>>>> 2) During the process of registration, the registering foreman
>> creates a
>>>>> Zookeeper node that indicates that one or more drillbits has not yet
>>>>> registered the UDF.
>>>>> 
>>>>> The basic workflow is that UDF jars are copied from the staging
>>> directory
>>>>> to the registration directory and validated. Once they are validated,
>>> the
>>>>> available drillbits are told to register the UDF. Registering the UDF
>>>>> consists of copying the node to a local UDF directory and updating the
>>>>> local (in-memory) udf registry. A sentinel node in zookeeper is used
>> to
>>>>> track when all the drillbits have registered the UDF.
>>>>> 
>>>>> There were two main suggestions : Immediate registration and lazy
>>>>> registration,
>>>>> 
>>>>> Immediate registration -
>>>>> Foreman tells all drillbits to register. Creates a Zookeeper node to
>>>>> track.
>>>>> Every drillbit makes a local copy and updates zookeeper node to show
>> it
>>>>> is done.
>>>>> Foreman checks the zookeeper node and when all available drillbits
>> have
>>>>> acknowledged, sends a message to all drillbits to complete
>> registration.
>>>>> Foreman removes ZK node.
>>>>> All Drillbits update their local UDF registry
>>>>> Drillbit startup will block if there is a ZK node indicating
>>>>> registration is in progress.
>>>>> This approach needs to be validated to see if any race conditions
>>> exist.
>>>>> 
>>>>> Lazy registration
>>>>> Once a UDF is copied to the registration folder, the UDF is
>> essentially
>>>>> registered. On first use, a drillbit may hit a classnotfound exception
>>> in
>>>>> which case it will look for the UDF in the registration directory. If
>>>>> found, it will copy to the local directory and add the UDF to it's
>> local
>>>>> registry.
>>>>> This approach should be investigated to see if it fits in with the
>>>>> current UDF execution code.
>>>>> 
>>>>> 
>>>>> On Mon, Jul 18, 2016 at 3:36 PM, Parth Chandra <pchan...@maprtech.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> +1 on simplifying the design and postpone the items Paul has
>> suggested.
>>>>>> 
>>>>>> Arina, Paul, I think we need to work out some of the design related
>> to
>>>>>> registering the UDF. Are you guys open for a quick hangout @10 a.m
>> PDT
>>>>>> tomorrow?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com>
>>> wrote:
>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> We’ve had quite a lively debate in the “comments” section of Arina’s
>>>>>>> wonderful design doc. Zelaine made a great suggestion: summarize the
>>> user
>>>>>>> experience as a way of making sense of the wealth of detailed
>>> comments.
>>>>>>> 
>>>>>>> IMHO, the most important user experience goals are:
>>>>>>> 
>>>>>>> 1. When a user submits a CREATE FUNCTION command, the command
>> returns
>>>>>>> quickly (within a few seconds at most.)
>>>>>>> 2. If the above user then issues a query using that function (to the
>>> same
>>>>>>> Foreman), that query is guaranteed to successfully use the new
>>> function on
>>>>>>> all nodes.
>>>>>>> 3. Other users, connecting to any Foreman will see a very clean
>>> behavior
>>>>>>> when submitting a query with the new function. Before some point in
>>> time
>>>>>>> (can be different for each Foreman), a query with the function fails
>>> in
>>>>>>> planning. After that point, queries are guaranteed to successfully
>>> use the
>>>>>>> new function on all nodes.
>>>>>>> 
>>>>>>> Basically, this says that CREATE FUNCTION can’t (potentially) take a
>>> long
>>>>>>> time. Use of functions can’t result in random failures during the
>>> time that
>>>>>>> the function is propagated across Drillbits.
>>>>>>> 
>>>>>>> The goals we can perhaps postpone are:
>>>>>>> 
>>>>>>> 1. Class name space isolation. (Allows two data scientists to define
>>> the
>>>>>>> same class without collisions.)
>>>>>>> 2. Function name spaces. (Allows me to define “paul.foo” and you to
>>>>>>> define “bob.foo” with out collisions. (Needed if many people develop
>>>>>>> functions independently. Else, we need a global name space.)
>>>>>>> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and
>> it
>>>>>>> requires unloading classes and name space cleanup.) (Just let the
>>> cleanup
>>>>>>> happen offline.)
>>>>>>> 4. Dependency jars (e.g. third party libraries, etc.) (We require
>>> those
>>>>>>> to be statically added to the class path before Drill starts.)
>>>>>>> 
>>>>>>> We are not creating per-user name spaces, or allowing people to use
>>>>>>> production clusters to try/revise functions. We’re just sampling
>>> deployment
>>>>>>> of simple functions.
>>>>>>> 
>>>>>>> That’s my suggestion, what do others suggest?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> - Paul
>>>>>>> 
>>>>>>>> On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva <
>>>>>>> arina.yelchiy...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> I also agree on using Zookeeper. I have re-worked dynamic UDF
>> support
>>>>>>>> document taking into account Zookeeper usage.
>>>>>>>> 
>>>>>>>> Link to the document -
>>>>>>>> 
>>>>>>> 
>>> 
>> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> Arina
>>>>>>>> 
>>>>>>>> On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <prog...@maprtech.com
>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Great idea! We already use ZK to track storage plugins. ZK is
>>> perhaps
>>>>>>>>> better suited to register each jar and/or function that using
>> files
>>> in
>>>>>>> DFS.
>>>>>>>>> Still need to work out the proper sequencing. But you are right,
>>> this
>>>>>>> is
>>>>>>>>> the kind of thing that ZK is supposed to solve.
>>>>>>>>> 
>>>>>>>>> - Paul
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Reading thru some of Paul's comments on maintaining a consistent
>>> state
>>>>>>>>> for
>>>>>>>>>> the registration of the UDF, it looks like we need a consensus
>>>>>>> protocol
>>>>>>>>> for
>>>>>>>>>> determining that all the Drillbits have the UDF deployed.
>>>>>>>>>> I believe Zookeeper can provide a stronger guarantee than a 2
>> phase
>>>>>>>>>> approach. Should we look into that?
>>>>>>>>>> 
>>>>>>>>>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva <
>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all!
>>>>>>>>>>> 
>>>>>>>>>>> I have updated design document.
>>>>>>>>>>> Main changes:
>>>>>>>>>>> 1. Add to Drill’s config цшер  the staging and registration DFS
>>>>>>>>> locations.
>>>>>>>>>>> 2. User is no longer is responsible for copying jars into
>> drillbit
>>>>>>>>> nodes.
>>>>>>>>>>> Now user needs to copy jars into staging DFS location from where
>>>>>>>>> drillbits
>>>>>>>>>>> will copy them to local fs.
>>>>>>>>>>> 2. During UDFs registration jars will be moved to DFS
>> registration
>>>>>>> area.
>>>>>>>>>>> 3. During start up drillbit will copy all jars from registration
>>>>>>> area,
>>>>>>>>> so
>>>>>>>>>>> newly added drillbit will have all UDFs as others.
>>>>>>>>>>> 4. Security issues - probably they will be added later as
>>>>>>> enhancement.
>>>>>>>>>>> 
>>>>>>>>>>> More detains in the document:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>> 
>> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
>>>>>>>>>>> 
>>>>>>>>>>> Kind regards
>>>>>>>>>>> Arina
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers <
>> prog...@maprtech.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>> 
>>>>>>>>>>>> To answer Arina on item 3: there is actually no good location
>> on
>>> any
>>>>>>>>>>> local
>>>>>>>>>>>> node to put the UDFs. Reason: DoY allows the admin to start a
>>>>>>> Drillbit
>>>>>>>>> on
>>>>>>>>>>>> any available node. When it starts, a new, fresh copy of Drill
>>> will
>>>>>>> be
>>>>>>>>>>>> downloaded, and this can happen after the user issued the
>> CREATE
>>>>>>>>> command.
>>>>>>>>>>>> 
>>>>>>>>>>>> What we need is a shared, secure distributed storage location
>>> from
>>>>>>>>> which
>>>>>>>>>>>> Drillbits can download the needed jar files. Something like…
>> DFS!
>>>>>>>>> Indeed,
>>>>>>>>>>>> this is how YARN stores the Drill archive from which it creates
>>> the
>>>>>>>>> Drill
>>>>>>>>>>>> install directory on each node. We can’t quite use YARN’s
>>> mechanism
>>>>>>>>> (YARN
>>>>>>>>>>>> is aware only of the files uploaded when launching an app), but
>>> we
>>>>>>> can
>>>>>>>>> do
>>>>>>>>>>>> something similar.
>>>>>>>>>>>> 
>>>>>>>>>>>> So, brainstorming a bit…
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Store the UDF jar in a pre-defined DFS location.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. The CREATE function 1) uploads the jar to the DFS location,
>>> and
>>>>>>> 2)
>>>>>>>>>>>> creates some kind of registry entry.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. The DELETE function 1) deregisters the jar (and function),
>>> but 2)
>>>>>>>>> does
>>>>>>>>>>>> not delete the jar (this allows in-flight queries to complete.)
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. Drillbits periodically check DFS for changed registrations,
>>>>>>>>>>> downloading
>>>>>>>>>>>> any needed jars. (YARN, Spark, Storm and others already do
>>> something
>>>>>>>>>>>> similar.)
>>>>>>>>>>>> 
>>>>>>>>>>>> 4. Registry check is “forced” when processing a query with a
>>>>>>> function
>>>>>>>>>>> that
>>>>>>>>>>>> is not currently registered. (Doing so resolves any possible
>> race
>>>>>>>>>>>> conditions.)
>>>>>>>>>>>> 
>>>>>>>>>>>> 5. Some process (perhaps time based) removes old, unregistered
>>> jar
>>>>>>>>> files.
>>>>>>>>>>>> (Or, we could get fancy and use reference counts. The reference
>>>>>>> count
>>>>>>>>>>> would
>>>>>>>>>>>> be required if the user wants to delete, then recreate, the
>> same
>>>>>>>>> function
>>>>>>>>>>>> and jar to avoid conflict with in-flight queries.)
>>>>>>>>>>>> 
>>>>>>>>>>>> We can build security on this as follows:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Define permissions for who can write to the DFS location.
>> Or,
>>>>>>>>> indeed,
>>>>>>>>>>>> have subdirectories by user and grant each user permission only
>>> on
>>>>>>>>> their
>>>>>>>>>>>> own UDF directory.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. Provide separate registries for per-user functions (private)
>>> and
>>>>>>>>>>> global
>>>>>>>>>>>> functions (public). Only the admin can add global functions.
>> But,
>>>>>>> only
>>>>>>>>>>> the
>>>>>>>>>>>> user that uploads a private function can use it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. Leverage the Java class loader to isolate UDFs in their own
>>> name
>>>>>>>>> space
>>>>>>>>>>>> (see Eclipse & Tomcat for examples). That is, Drill can call
>>> into a
>>>>>>>>> UDF,
>>>>>>>>>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill
>>>>>>> classes
>>>>>>>>>>>> (accidentally or maliciously.) Plus, my function Foo won’t
>> clash
>>>>>>> with
>>>>>>>>>>> your
>>>>>>>>>>>> function Foo if both are private.
>>>>>>>>>>>> 
>>>>>>>>>>>> Sorry that this has wandered a bit far from the original simple
>>>>>>> design,
>>>>>>>>>>>> but the above may capture much of what folks expect in modern
>>>>>>>>> distributed
>>>>>>>>>>>> big data systems.
>>>>>>>>>>>> 
>>>>>>>>>>>> I wonder if a good next step might be to review the notes in
>> the
>>>>>>> design
>>>>>>>>>>>> doc, in the JIRA, and in this e-mail chain and to prepare a
>>> summary
>>>>>>> of
>>>>>>>>>>>> technical requirements, and a proposed design. Postpone, at
>> least
>>>>>>> for
>>>>>>>>>>> now,
>>>>>>>>>>>> concerns about the amount of work; we can worry about that once
>>>>>>> folks
>>>>>>>>>>> agree
>>>>>>>>>>>> on your revised design.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> - Paul
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva <
>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 4. Authorization model mentioned by Julia and John
>>>>>>>>>>>>> If user won't have rights to copy jars to UDF classpath, which
>>> can
>>>>>>> be
>>>>>>>>>>>>> restricted by file system, he won't be able to do much harm by
>>>>>>> running
>>>>>>>>>>>>> CREATE command. If UDFs from jar were already registered,
>> CREATE
>>>>>>>>>>>> statement
>>>>>>>>>>>>> will fail. CREATE OR REPLACE will just re-register UDFs.
>>>>>>>>>>>>> But DELETE command is not safe. If user knows jar name, he can
>>>>>>> delete
>>>>>>>>>>> all
>>>>>>>>>>>>> associated with it UDFs, as well as the binary and source
>> jars.
>>>>>>> That's
>>>>>>>>>>>>> where we'll probably need to impose restrictions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva <
>>>>>>>>>>>> arina.yelchiy...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. DELETE command - I missed to indicate it document but had
>> it
>>>>>>> in my
>>>>>>>>>>>>>> mind. When user issues DELETE command, all UDF associated
>> with
>>>>>>>>>>> indicated
>>>>>>>>>>>>>> jar is removed from DrillFunctionRegistry. And then binary
>> and
>>>>>>> source
>>>>>>>>>>>>>> files are also deleted from UDF classpath.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. Distribution race condition described by Paul
>>>>>>>>>>>>>> User issues CREATE command and gets confirmation that UDFs is
>>>>>>>>>>> registered
>>>>>>>>>>>>>> only if all drilllbits have confirmed that registration was
>>>>>>>>>>> successful.
>>>>>>>>>>>>>> I don't expect user to start using UDFs in queries prior to
>>> CREATE
>>>>>>>>>>>> command
>>>>>>>>>>>>>> success / failure result, which is possible but strange.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. DoY
>>>>>>>>>>>>>> @Paul
>>>>>>>>>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly we
>>> use
>>>>>>>>>>>>>> $DRILL_UDF environment variable which will be set during
>>> drillbit
>>>>>>>>>>> start
>>>>>>>>>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable will
>> be
>>>>>>> added
>>>>>>>>>>> to
>>>>>>>>>>>>>> Drill classpath during start.
>>>>>>>>>>>>>> Will it ease DoY integration somehow?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Kind regards
>>>>>>>>>>>>>> Arina
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman
>>>>>>>>>>>> <yufeld...@yahoo.com.invalid>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Just thoughts:
>>>>>>>>>>>>>>> You can try to reuse distributed cache Let Drill AM do the
>>>>>>> needful
>>>>>>>>> in
>>>>>>>>>>>>>>> terms of orchestrating UDF jars distribution.
>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>> I would be inclined to have a common path that is
>> independent
>>> of
>>>>>>> the
>>>>>>>>>>>> fact
>>>>>>>>>>>>>>> that it is Drill on YARN or not, as maintaining two separate
>>>>>>> ways of
>>>>>>>>>>>>>>> dealing with loading/unloading UDFs will be painful and
>> error
>>>>>>> prone.
>>>>>>>>>>>>>>> One more note (I left a comment in the doc) - not sure about
>>>>>>>>>>>>>>> authorization model here - we need to have some.
>>>>>>>>>>>>>>> Just my 2cThanks
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> From: Paul Rogers <prog...@maprtech.com>
>>>>>>>>>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org>
>>>>>>>>>>>>>>> Sent: Monday, June 20, 2016 7:32 PM
>>>>>>>>>>>>>>> Subject: Re: Dynamic UDFs support
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Neeraja,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The proposal calls for the user to copy the jar file to each
>>>>>>>>> Drillbit
>>>>>>>>>>>>>>> node. The jar would go into a new
>>> $DRILL_HOME/jars/3rdparty/udf
>>>>>>>>>>>> directory.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying
>> Drill
>>>>>>> code
>>>>>>>>> to
>>>>>>>>>>>>>>> each node (which is good.) YARN puts that code in a location
>>>>>>> known
>>>>>>>>>>>> only to
>>>>>>>>>>>>>>> YARN. Since the location is private to YARN, the user can’t
>>>>>>> easily
>>>>>>>>>>> hunt
>>>>>>>>>>>>>>> down the location in order to add the udf jar. Even if the
>>> user
>>>>>>> did
>>>>>>>>>>>> find
>>>>>>>>>>>>>>> the location, the next Drillbit to start would create a new
>>> copy
>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>>>> Drill software, without the udf jar.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Second, in DoY we have separated user files from Drill
>>> software.
>>>>>>>>> This
>>>>>>>>>>>>>>> makes it much easier to distribute the software to each
>> node:
>>> we
>>>>>>>>> give
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Drill distribution tar archive to YARN, and YARN copies it
>> to
>>>>>>> each
>>>>>>>>>>>> node and
>>>>>>>>>>>>>>> untars the Drill files. We make a separate copy of the (far
>>>>>>> smaller)
>>>>>>>>>>>> set of
>>>>>>>>>>>>>>> user config files.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If the udf jar goes into a Drill folder
>>>>>>>>>>>> ($DRILL_HOME/jars/3rdparty/udf),
>>>>>>>>>>>>>>> then the user would have to rebuild the Drill tar file each
>>> time
>>>>>>>>> they
>>>>>>>>>>>> add a
>>>>>>>>>>>>>>> udf jar. When I tried this myself when building DoY, I found
>>> it
>>>>>>> to
>>>>>>>>> be
>>>>>>>>>>>> slow
>>>>>>>>>>>>>>> and error-prone.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So, the solution is to place the udf code in the new “site”
>>>>>>>>>>> directory:
>>>>>>>>>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY
>>>>>>>>>>> automatically
>>>>>>>>>>>>>>> distribute the code to every node. Perfect! Except that it
>>> does
>>>>>>> not
>>>>>>>>>>>> work to
>>>>>>>>>>>>>>> dynamically distribute code after Drill starts.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> For DoY, the solution requirements are:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1. Distribute code using Drill itself, rather than manually
>>>>>>> copying
>>>>>>>>>>>> jars
>>>>>>>>>>>>>>> to (unknown) Drill directories.
>>>>>>>>>>>>>>> 2. Ensure the solution works even if another Drillbit is
>> spun
>>> up
>>>>>>>>>>> later,
>>>>>>>>>>>>>>> and uses the original Drill tar file.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’m thinking we want to leverage DFS: place udf files into a
>>>>>>>>>>> well-known
>>>>>>>>>>>>>>> DFS directory. Register the udf into, say, ZK. When a new
>>>>>>> Drillbit
>>>>>>>>>>>> starts,
>>>>>>>>>>>>>>> it looks for new udf jars in ZK, copies the file to a
>>> temporary
>>>>>>>>>>>> location,
>>>>>>>>>>>>>>> and launches. An existing Drill is notified of the change
>> and
>>>>>>> does
>>>>>>>>>>> the
>>>>>>>>>>>> same
>>>>>>>>>>>>>>> download process. Clean-up is needed at some point to remove
>>> ZK
>>>>>>>>>>>> entries if
>>>>>>>>>>>>>>> the udf jar becomes statically available on the next launch.
>>> That
>>>>>>>>>>> needs
>>>>>>>>>>>>>>> more thought.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We’d still need the phases mentioned earlier to ensure
>>>>>>> consistency.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Suggestions anyone as to how to do this super simply & still
>>> get
>>>>>>> it
>>>>>>>>>>> to
>>>>>>>>>>>>>>> work with DoY?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Paul
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala <
>>>>>>>>>>>>>>> nrentachint...@maprtech.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This will need to work with YARN (Once Drill is YARN
>>> enabled, I
>>>>>>>>>>> would
>>>>>>>>>>>>>>>> expect a lot of users using it in conjunction with YARN).
>>>>>>>>>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. Can
>>> you
>>>>>>>>>>>>>>> elaborate.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Neeraja
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <
>>>>>>> prog...@maprtech.com
>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Good enough, as long as we document the limitation that
>> this
>>>>>>>>>>> feature
>>>>>>>>>>>>>>> can’t
>>>>>>>>>>>>>>>>> work with YARN deployment as users generally do not have
>>>>>>> access to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> temporary “localization” directories where the Drill code
>> is
>>>>>>>>> placed
>>>>>>>>>>>> by
>>>>>>>>>>>>>>> YARN.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Note that the jar distribution race condition issue occurs
>>> with
>>>>>>>>> the
>>>>>>>>>>>>>>>>> proposed design: I believe I sketched out a scenario in
>> one
>>> of
>>>>>>> the
>>>>>>>>>>>>>>> earlier
>>>>>>>>>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION command.
>>> It
>>>>>>>>> tells
>>>>>>>>>>>>>>>>> Drillbit B. While informing the other Drillbits, Drillbit
>> B
>>>>>>> plans
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> launches a query that uses the function. Drillbit Z starts
>>>>>>>>>>> execution
>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>> query before it learns from A about the new function. This
>>>>>>> will be
>>>>>>>>>>>>>>> rare —
>>>>>>>>>>>>>>>>> just rare enough to create very hard to reproduce bugs.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The only reliable solution is to do the work in multiple
>>>>>>> passes:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Pass 1: Ask each node to load the function, but not make
>> it
>>>>>>>>>>> available
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> the planner. (it would be available to the execution
>>> engine.)
>>>>>>>>>>>>>>>>> Pass 2: Await confirmation from each node that this is
>> done.
>>>>>>>>>>>>>>>>> Pass 3: Alert every node that it is now free to plan
>> queries
>>>>>>> with
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> function.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Finally, I wonder if we should design the SQL syntax based
>>> on a
>>>>>>>>>>>>>>> long-term
>>>>>>>>>>>>>>>>> design, even if the feature itself is a short-term
>>> work-around.
>>>>>>>>>>>>>>> Changing
>>>>>>>>>>>>>>>>> the syntax later might break scripts that users might
>> write.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So, the question for the group is this: is the value of
>>>>>>>>>>> semi-complete
>>>>>>>>>>>>>>>>> feature sufficient to justify the potential problems?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Paul
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra <
>>>>>>>>> pchan...@maprtech.com
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Moving discussion to dev.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I believe the aim is to do a simple implementation
>> without
>>> the
>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>> of distributing the UDF. I think the document should make
>>> this
>>>>>>>>>>>>>>> limitation
>>>>>>>>>>>>>>>>>> clear.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Per Paul's point on there being a simpler solution of
>> just
>>>>>>> having
>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>> drillbit detect the if a UDF is present, I think the
>>> problem
>>>>>>> is
>>>>>>>>>>> if a
>>>>>>>>>>>>>>> UDF
>>>>>>>>>>>>>>>>>> get's deployed to some but not all drillbits. A query can
>>> then
>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>> executing but not run successfully. The intent of the
>>> create
>>>>>>>>>>>> commands
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be to ensure that all drillbits have the UDF or none
>> would.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think Jacques' point about ownership conflicts is not
>>>>>>> addressed
>>>>>>>>>>>>>>>>> clearly.
>>>>>>>>>>>>>>>>>> Also, the unloading is not clear. The delete command
>> should
>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>> the UDF and unload it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <
>>>>>>>>>>> prog...@maprtech.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary
>>>>>>> comments
>>>>>>>>>>> for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> community to consider.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN project.
>> Is
>>>>>>> this
>>>>>>>>> a
>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>> fix for one unique problem, or is it worth expanding the
>>>>>>>>> solution
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make
>> the
>>> two
>>>>>>>>>>> work
>>>>>>>>>>>>>>>>> together
>>>>>>>>>>>>>>>>>>> later. See comments in docs for details.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. Have we, by chance, looked at how other projects
>> handle
>>>>>>> code
>>>>>>>>>>>>>>>>>>> distribution? Spark, Storm and others automatically
>> deploy
>>>>>>> code
>>>>>>>>>>>>>>> across
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> cluster; no manual distribution to each node. The key
>>>>>>> difference
>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>> Drill and others is that, for Storm, say, code is
>>> associated
>>>>>>>>>>> with a
>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions
>> are
>>>>>>> global
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> no obvious life cycle that suggests when the code can be
>>>>>>>>>>> unloaded.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3. Have considered the class loader, dependency and name
>>>>>>> space
>>>>>>>>>>>>>>> isolation
>>>>>>>>>>>>>>>>>>> issues addressed by such products as Tomcat (web apps)
>> or
>>>>>>>>> Eclipse
>>>>>>>>>>>>>>>>>>> (plugins)? Putting user code in the same namespace as
>>> Drill
>>>>>>> code
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to
>>>>>>> problems
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> require long, frustrating debugging sessions to resolve.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing
>>> items
>>>>>>> 2
>>>>>>>>>>> and
>>>>>>>>>>>> 3
>>>>>>>>>>>>>>>>> are a
>>>>>>>>>>>>>>>>>>> big increase in scope, so I won’t be surprised if we
>> leave
>>>>>>> those
>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> later. (Though, addressing item 2 might be the best way
>> to
>>>>>>>>>>> address
>>>>>>>>>>>>>>> item
>>>>>>>>>>>>>>>>> 1.)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If we want a very simple solution that requires minimal
>>>>>>> change,
>>>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can use an even simpler solution. In the proposed
>> design,
>>> the
>>>>>>>>>>> user
>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> must distribute code to all the nodes. The primary
>> change
>>> is
>>>>>>> to
>>>>>>>>>>>> tell
>>>>>>>>>>>>>>>>> Drill
>>>>>>>>>>>>>>>>>>> to load (or unload) that code. Can accomplish the same
>>> result
>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>> simply
>>>>>>>>>>>>>>>>>>> by having Drill periodically scan certain directories
>>> looking
>>>>>>>>> for
>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve the
>>> name
>>>>>>>>>>> space
>>>>>>>>>>>>>>>>> issues,
>>>>>>>>>>>>>>>>>>> but will work for existing non-YARN Drill users without
>>> new
>>>>>>> SQL
>>>>>>>>>>>>>>> syntax.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> - Paul
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <
>>>>>>>>> jacq...@dremio.com
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Two quick thoughts:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - (user) In the design document I didn't see any
>>> discussion
>>>>>>> of
>>>>>>>>>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to
>> see
>>>>>>> the
>>>>>>>>>>>>>>> thinking
>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> - (dev) There is a row oriented facade via the
>>>>>>>>>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That
>> would
>>>>>>> be a
>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>> place
>>>>>>>>>>>>>>>>>>>> to start when trying to implement an alternative
>>> interface.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Jacques Nadeau
>>>>>>>>>>>>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <
>>>>>>>>>>> j...@omernik.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I think
>>> some
>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> ideas
>>>>>>>>>>>>>>>>>>>>> around community java UDFs could be a better approach.
>>> I'd
>>>>>>>>> hate
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>>>>>>>> away from other work to hack in something like this.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <
>>>>>>>>>>>> prog...@maprtech.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Ted refers to source code transformation. Drill gains
>>> its
>>>>>>>>>>> speed
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the
>> row-based
>>>>>>>>>>> interface
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are
>>> very
>>>>>>> type
>>>>>>>>>>>>>>>>> specific,
>>>>>>>>>>>>>>>>>>>>>> code is typically generated to handle the specifics
>> of
>>>>>>> each
>>>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>>> Accessing
>>>>>>>>>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because of
>>> the
>>>>>>>>>>>>>>> "impedence
>>>>>>>>>>>>>>>>>>>>>> mismatch" between how VVs work and the row-and-column
>>> view
>>>>>>>>>>>>>>> expected
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>> (non-Drill) developers.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I wonder if we've considered providing a row-oriented
>>>>>>>>> "facade"
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> used by roll-your own data sources and user-defined
>> row
>>>>>>>>>>>>>>> transforms?
>>>>>>>>>>>>>>>>>>> Might
>>>>>>>>>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be
>> handy
>>>>>>> for
>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>> willing
>>>>>>>>>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such a
>>>>>>> facade,
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> Jython
>>>>>>>>>>>>>>>>>>>>> row
>>>>>>>>>>>>>>>>>>>>>> transforms that John mentions could be quite simple.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <
>>>>>>>>>>>>>>> ted.dunn...@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Since UDF's use source code transformation, using
>>> Jython
>>>>>>>>>>> would
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> difficult.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
>>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi Charles,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't
>>> invent
>>>>>>>>>>>>>>> anything
>>>>>>>>>>>>>>>>>>>>> new,
>>>>>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit
>>> restart.
>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>> contributions
>>>>>>>>>>>>>>>>>>>>>>>> are welcomed.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <
>>>>>>>>>>>> cgi...@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Arina,
>>>>>>>>>>>>>>>>>>>>>>>>> Has there been any discussion about making it
>>> possible
>>>>>>> via
>>>>>>>>>>>>>>> Jython
>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>> something for users to write simple UDFs in
>> Python?
>>>>>>>>>>>>>>>>>>>>>>>>> My ideal would be to have this capability
>>> integrated in
>>>>>>>>> the
>>>>>>>>>>>> web
>>>>>>>>>>>>>>>>> GUI
>>>>>>>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python)
>> right
>>>>>>> there,
>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes validation
>>>>>>> tests.
>>>>>>>>>>>>>>>>>>>>>>>>> —C
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
>>>>>>>>>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs support
>>> in
>>>>>>>>>>> Drill (
>>>>>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726
>> ).
>>>>>>> There
>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>>>>>> link
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> design document in Jira description.
>>>>>>>>>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards
>>>>>>>>>>>>>>>>>>>>>>>>>> Arina
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>

Re: Dynamic UDFs support

Reply via email to