Re: Dynamic UDFs support

Parth Chandra Mon, 18 Jul 2016 15:37:31 -0700

+1 on simplifying the design and postpone the items Paul has suggested.

Arina, Paul, I think we need to work out some of the design related to
registering the UDF. Are you guys open for a quick hangout @10 a.m PDT
tomorrow?




On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <[email protected]> wrote:

> Hi All,
>
> We’ve had quite a lively debate in the “comments” section of Arina’s
> wonderful design doc. Zelaine made a great suggestion: summarize the user
> experience as a way of making sense of the wealth of detailed comments.
>
> IMHO, the most important user experience goals are:
>
> 1. When a user submits a CREATE FUNCTION command, the command returns
> quickly (within a few seconds at most.)
> 2. If the above user then issues a query using that function (to the same
> Foreman), that query is guaranteed to successfully use the new function on
> all nodes.
> 3. Other users, connecting to any Foreman will see a very clean behavior
> when submitting a query with the new function. Before some point in time
> (can be different for each Foreman), a query with the function fails in
> planning. After that point, queries are guaranteed to successfully use the
> new function on all nodes.
>
> Basically, this says that CREATE FUNCTION can’t (potentially) take a long
> time. Use of functions can’t result in random failures during the time that
> the function is propagated across Drillbits.
>
> The goals we can perhaps postpone are:
>
> 1. Class name space isolation. (Allows two data scientists to define the
> same class without collisions.)
> 2. Function name spaces. (Allows me to define “paul.foo” and you to define
> “bob.foo” with out collisions. (Needed if many people develop functions
> independently. Else, we need a global name space.)
> 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and it
> requires unloading classes and name space cleanup.) (Just let the cleanup
> happen offline.)
> 4. Dependency jars (e.g. third party libraries, etc.) (We require those to
> be statically added to the class path before Drill starts.)
>
> We are not creating per-user name spaces, or allowing people to use
> production clusters to try/revise functions. We’re just sampling deployment
> of simple functions.
>
> That’s my suggestion, what do others suggest?
>
> Thanks,
>
> - Paul
>
> > On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva <
> [email protected]> wrote:
> >
> > I also agree on using Zookeeper. I have re-worked dynamic UDF support
> > document taking into account Zookeeper usage.
> >
> > Link to the document -
> >
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> >
> > Kind regards
> > Arina
> >
> > On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <[email protected]>
> wrote:
> >
> >> Great idea! We already use ZK to track storage plugins. ZK is perhaps
> >> better suited to register each jar and/or function that using files in
> DFS.
> >> Still need to work out the proper sequencing. But you are right, this is
> >> the kind of thing that ZK is supposed to solve.
> >>
> >> - Paul
> >>
> >>
> >>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <[email protected]> wrote:
> >>>
> >>> Reading thru some of Paul's comments on maintaining a consistent state
> >> for
> >>> the registration of the UDF, it looks like we need a consensus protocol
> >> for
> >>> determining that all the Drillbits have the UDF deployed.
> >>> I believe Zookeeper can provide a stronger guarantee than a 2 phase
> >>> approach. Should we look into that?
> >>>
> >>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva <
> >>> [email protected]> wrote:
> >>>
> >>>> Hi all!
> >>>>
> >>>> I have updated design document.
> >>>> Main changes:
> >>>> 1. Add to Drill’s config цшер  the staging and registration DFS
> >> locations.
> >>>> 2. User is no longer is responsible for copying jars into drillbit
> >> nodes.
> >>>> Now user needs to copy jars into staging DFS location from where
> >> drillbits
> >>>> will copy them to local fs.
> >>>> 2. During UDFs registration jars will be moved to DFS registration
> area.
> >>>> 3. During start up drillbit will copy all jars from registration area,
> >> so
> >>>> newly added drillbit will have all UDFs as others.
> >>>> 4. Security issues - probably they will be added later as enhancement.
> >>>>
> >>>> More detains in the document:
> >>>>
> >>>>
> >>
> https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit
> >>>>
> >>>> Kind regards
> >>>> Arina
> >>>>
> >>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers <[email protected]>
> >> wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> To answer Arina on item 3: there is actually no good location on any
> >>>> local
> >>>>> node to put the UDFs. Reason: DoY allows the admin to start a
> Drillbit
> >> on
> >>>>> any available node. When it starts, a new, fresh copy of Drill will
> be
> >>>>> downloaded, and this can happen after the user issued the CREATE
> >> command.
> >>>>>
> >>>>> What we need is a shared, secure distributed storage location from
> >> which
> >>>>> Drillbits can download the needed jar files. Something like… DFS!
> >> Indeed,
> >>>>> this is how YARN stores the Drill archive from which it creates the
> >> Drill
> >>>>> install directory on each node. We can’t quite use YARN’s mechanism
> >> (YARN
> >>>>> is aware only of the files uploaded when launching an app), but we
> can
> >> do
> >>>>> something similar.
> >>>>>
> >>>>> So, brainstorming a bit…
> >>>>>
> >>>>> 1. Store the UDF jar in a pre-defined DFS location.
> >>>>>
> >>>>> 2. The CREATE function 1) uploads the jar to the DFS location, and 2)
> >>>>> creates some kind of registry entry.
> >>>>>
> >>>>> 3. The DELETE function 1) deregisters the jar (and function), but 2)
> >> does
> >>>>> not delete the jar (this allows in-flight queries to complete.)
> >>>>>
> >>>>> 3. Drillbits periodically check DFS for changed registrations,
> >>>> downloading
> >>>>> any needed jars. (YARN, Spark, Storm and others already do something
> >>>>> similar.)
> >>>>>
> >>>>> 4. Registry check is “forced” when processing a query with a function
> >>>> that
> >>>>> is not currently registered. (Doing so resolves any possible race
> >>>>> conditions.)
> >>>>>
> >>>>> 5. Some process (perhaps time based) removes old, unregistered jar
> >> files.
> >>>>> (Or, we could get fancy and use reference counts. The reference count
> >>>> would
> >>>>> be required if the user wants to delete, then recreate, the same
> >> function
> >>>>> and jar to avoid conflict with in-flight queries.)
> >>>>>
> >>>>> We can build security on this as follows:
> >>>>>
> >>>>> 1. Define permissions for who can write to the DFS location. Or,
> >> indeed,
> >>>>> have subdirectories by user and grant each user permission only on
> >> their
> >>>>> own UDF directory.
> >>>>>
> >>>>> 2. Provide separate registries for per-user functions (private) and
> >>>> global
> >>>>> functions (public). Only the admin can add global functions. But,
> only
> >>>> the
> >>>>> user that uploads a private function can use it.
> >>>>>
> >>>>> 3. Leverage the Java class loader to isolate UDFs in their own name
> >> space
> >>>>> (see Eclipse & Tomcat for examples). That is, Drill can call into a
> >> UDF,
> >>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill
> classes
> >>>>> (accidentally or maliciously.) Plus, my function Foo won’t clash with
> >>>> your
> >>>>> function Foo if both are private.
> >>>>>
> >>>>> Sorry that this has wandered a bit far from the original simple
> design,
> >>>>> but the above may capture much of what folks expect in modern
> >> distributed
> >>>>> big data systems.
> >>>>>
> >>>>> I wonder if a good next step might be to review the notes in the
> design
> >>>>> doc, in the JIRA, and in this e-mail chain and to prepare a summary
> of
> >>>>> technical requirements, and a proposed design. Postpone, at least for
> >>>> now,
> >>>>> concerns about the amount of work; we can worry about that once folks
> >>>> agree
> >>>>> on your revised design.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> - Paul
> >>>>>
> >>>>>
> >>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva <
> >>>>> [email protected]> wrote:
> >>>>>>
> >>>>>> 4. Authorization model mentioned by Julia and John
> >>>>>> If user won't have rights to copy jars to UDF classpath, which can
> be
> >>>>>> restricted by file system, he won't be able to do much harm by
> running
> >>>>>> CREATE command. If UDFs from jar were already registered, CREATE
> >>>>> statement
> >>>>>> will fail. CREATE OR REPLACE will just re-register UDFs.
> >>>>>> But DELETE command is not safe. If user knows jar name, he can
> delete
> >>>> all
> >>>>>> associated with it UDFs, as well as the binary and source jars.
> That's
> >>>>>> where we'll probably need to impose restrictions.
> >>>>>>
> >>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva <
> >>>>> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> 1. DELETE command - I missed to indicate it document but had it in
> my
> >>>>>>> mind. When user issues DELETE command, all UDF associated with
> >>>> indicated
> >>>>>>> jar is removed from DrillFunctionRegistry. And then binary and
> source
> >>>>>>> files are also deleted from UDF classpath.
> >>>>>>>
> >>>>>>> 2. Distribution race condition described by Paul
> >>>>>>> User issues CREATE command and gets confirmation that UDFs is
> >>>> registered
> >>>>>>> only if all drilllbits have confirmed that registration was
> >>>> successful.
> >>>>>>> I don't expect user to start using UDFs in queries prior to CREATE
> >>>>> command
> >>>>>>> success / failure result, which is possible but strange.
> >>>>>>>
> >>>>>>> 3. DoY
> >>>>>>> @Paul
> >>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly we use
> >>>>>>> $DRILL_UDF environment variable which will be set during drillbit
> >>>> start
> >>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable will be
> added
> >>>> to
> >>>>>>> Drill classpath during start.
> >>>>>>> Will it ease DoY integration somehow?
> >>>>>>>
> >>>>>>> Kind regards
> >>>>>>> Arina
> >>>>>>>
> >>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman
> >>>>> <[email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Just thoughts:
> >>>>>>>> You can try to reuse distributed cache Let Drill AM do the needful
> >> in
> >>>>>>>> terms of orchestrating UDF jars distribution.
> >>>>>>>> But
> >>>>>>>> I would be inclined to have a common path that is independent of
> the
> >>>>> fact
> >>>>>>>> that it is Drill on YARN or not, as maintaining two separate ways
> of
> >>>>>>>> dealing with loading/unloading UDFs will be painful and error
> prone.
> >>>>>>>> One more note (I left a comment in the doc) - not sure about
> >>>>>>>> authorization model here - we need to have some.
> >>>>>>>> Just my 2cThanks
> >>>>>>>>
> >>>>>>>>    From: Paul Rogers <[email protected]>
> >>>>>>>> To: "[email protected]" <[email protected]>
> >>>>>>>> Sent: Monday, June 20, 2016 7:32 PM
> >>>>>>>> Subject: Re: Dynamic UDFs support
> >>>>>>>>
> >>>>>>>> Hi Neeraja,
> >>>>>>>>
> >>>>>>>> The proposal calls for the user to copy the jar file to each
> >> Drillbit
> >>>>>>>> node. The jar would go into a new $DRILL_HOME/jars/3rdparty/udf
> >>>>> directory.
> >>>>>>>>
> >>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying Drill code
> >> to
> >>>>>>>> each node (which is good.) YARN puts that code in a location known
> >>>>> only to
> >>>>>>>> YARN. Since the location is private to YARN, the user can’t easily
> >>>> hunt
> >>>>>>>> down the location in order to add the udf jar. Even if the user
> did
> >>>>> find
> >>>>>>>> the location, the next Drillbit to start would create a new copy
> of
> >>>> the
> >>>>>>>> Drill software, without the udf jar.
> >>>>>>>>
> >>>>>>>> Second, in DoY we have separated user files from Drill software.
> >> This
> >>>>>>>> makes it much easier to distribute the software to each node: we
> >> give
> >>>>> the
> >>>>>>>> Drill distribution tar archive to YARN, and YARN copies it to each
> >>>>> node and
> >>>>>>>> untars the Drill files. We make a separate copy of the (far
> smaller)
> >>>>> set of
> >>>>>>>> user config files.
> >>>>>>>>
> >>>>>>>> If the udf jar goes into a Drill folder
> >>>>> ($DRILL_HOME/jars/3rdparty/udf),
> >>>>>>>> then the user would have to rebuild the Drill tar file each time
> >> they
> >>>>> add a
> >>>>>>>> udf jar. When I tried this myself when building DoY, I found it to
> >> be
> >>>>> slow
> >>>>>>>> and error-prone.
> >>>>>>>>
> >>>>>>>> So, the solution is to place the udf code in the new “site”
> >>>> directory:
> >>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY
> >>>> automatically
> >>>>>>>> distribute the code to every node. Perfect! Except that it does
> not
> >>>>> work to
> >>>>>>>> dynamically distribute code after Drill starts.
> >>>>>>>>
> >>>>>>>> For DoY, the solution requirements are:
> >>>>>>>>
> >>>>>>>> 1. Distribute code using Drill itself, rather than manually
> copying
> >>>>> jars
> >>>>>>>> to (unknown) Drill directories.
> >>>>>>>> 2. Ensure the solution works even if another Drillbit is spun up
> >>>> later,
> >>>>>>>> and uses the original Drill tar file.
> >>>>>>>>
> >>>>>>>> I’m thinking we want to leverage DFS: place udf files into a
> >>>> well-known
> >>>>>>>> DFS directory. Register the udf into, say, ZK. When a new Drillbit
> >>>>> starts,
> >>>>>>>> it looks for new udf jars in ZK, copies the file to a temporary
> >>>>> location,
> >>>>>>>> and launches. An existing Drill is notified of the change and does
> >>>> the
> >>>>> same
> >>>>>>>> download process. Clean-up is needed at some point to remove ZK
> >>>>> entries if
> >>>>>>>> the udf jar becomes statically available on the next launch. That
> >>>> needs
> >>>>>>>> more thought.
> >>>>>>>>
> >>>>>>>> We’d still need the phases mentioned earlier to ensure
> consistency.
> >>>>>>>>
> >>>>>>>> Suggestions anyone as to how to do this super simply & still get
> it
> >>>> to
> >>>>>>>> work with DoY?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> - Paul
> >>>>>>>>
> >>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> This will need to work with YARN (Once Drill is YARN enabled, I
> >>>> would
> >>>>>>>>> expect a lot of users using it in conjunction with YARN).
> >>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. Can you
> >>>>>>>> elaborate.
> >>>>>>>>>
> >>>>>>>>> -Neeraja
> >>>>>>>>>
> >>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <
> [email protected]
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Good enough, as long as we document the limitation that this
> >>>> feature
> >>>>>>>> can’t
> >>>>>>>>>> work with YARN deployment as users generally do not have access
> to
> >>>>> the
> >>>>>>>>>> temporary “localization” directories where the Drill code is
> >> placed
> >>>>> by
> >>>>>>>> YARN.
> >>>>>>>>>>
> >>>>>>>>>> Note that the jar distribution race condition issue occurs with
> >> the
> >>>>>>>>>> proposed design: I believe I sketched out a scenario in one of
> the
> >>>>>>>> earlier
> >>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION command. It
> >> tells
> >>>>>>>>>> Drillbit B. While informing the other Drillbits, Drillbit B
> plans
> >>>> and
> >>>>>>>>>> launches a query that uses the function. Drillbit Z starts
> >>>> execution
> >>>>>>>> of the
> >>>>>>>>>> query before it learns from A about the new function. This will
> be
> >>>>>>>> rare —
> >>>>>>>>>> just rare enough to create very hard to reproduce bugs.
> >>>>>>>>>>
> >>>>>>>>>> The only reliable solution is to do the work in multiple passes:
> >>>>>>>>>>
> >>>>>>>>>> Pass 1: Ask each node to load the function, but not make it
> >>>> available
> >>>>>>>> to
> >>>>>>>>>> the planner. (it would be available to the execution engine.)
> >>>>>>>>>> Pass 2: Await confirmation from each node that this is done.
> >>>>>>>>>> Pass 3: Alert every node that it is now free to plan queries
> with
> >>>> the
> >>>>>>>>>> function.
> >>>>>>>>>>
> >>>>>>>>>> Finally, I wonder if we should design the SQL syntax based on a
> >>>>>>>> long-term
> >>>>>>>>>> design, even if the feature itself is a short-term work-around.
> >>>>>>>> Changing
> >>>>>>>>>> the syntax later might break scripts that users might write.
> >>>>>>>>>>
> >>>>>>>>>> So, the question for the group is this: is the value of
> >>>> semi-complete
> >>>>>>>>>> feature sufficient to justify the potential problems?
> >>>>>>>>>>
> >>>>>>>>>> - Paul
> >>>>>>>>>>
> >>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra <
> >> [email protected]
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Moving discussion to dev.
> >>>>>>>>>>>
> >>>>>>>>>>> I believe the aim is to do a simple implementation without the
> >>>>>>>> complexity
> >>>>>>>>>>> of distributing the UDF. I think the document should make this
> >>>>>>>> limitation
> >>>>>>>>>>> clear.
> >>>>>>>>>>>
> >>>>>>>>>>> Per Paul's point on there being a simpler solution of just
> having
> >>>>> each
> >>>>>>>>>>> drillbit detect the if a UDF is present, I think the problem is
> >>>> if a
> >>>>>>>> UDF
> >>>>>>>>>>> get's deployed to some but not all drillbits. A query can then
> >>>> start
> >>>>>>>>>>> executing but not run successfully. The intent of the create
> >>>>> commands
> >>>>>>>>>> would
> >>>>>>>>>>> be to ensure that all drillbits have the UDF or none would.
> >>>>>>>>>>>
> >>>>>>>>>>> I think Jacques' point about ownership conflicts is not
> addressed
> >>>>>>>>>> clearly.
> >>>>>>>>>>> Also, the unloading is not clear. The delete command should
> >>>> probably
> >>>>>>>>>> remove
> >>>>>>>>>>> the UDF and unload it.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <
> >>>> [email protected]
> >>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary
> comments
> >>>> for
> >>>>>>>> the
> >>>>>>>>>>>> community to consider.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN project. Is
> this
> >> a
> >>>>>>>>>> specific
> >>>>>>>>>>>> fix for one unique problem, or is it worth expanding the
> >> solution
> >>>>> to
> >>>>>>>>>> work
> >>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make the two
> >>>> work
> >>>>>>>>>> together
> >>>>>>>>>>>> later. See comments in docs for details.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Have we, by chance, looked at how other projects handle
> code
> >>>>>>>>>>>> distribution? Spark, Storm and others automatically deploy
> code
> >>>>>>>> across
> >>>>>>>>>> the
> >>>>>>>>>>>> cluster; no manual distribution to each node. The key
> difference
> >>>>>>>> between
> >>>>>>>>>>>> Drill and others is that, for Storm, say, code is associated
> >>>> with a
> >>>>>>>> job
> >>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions are
> global
> >>>>> and
> >>>>>>>>>> have
> >>>>>>>>>>>> no obvious life cycle that suggests when the code can be
> >>>> unloaded.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3. Have considered the class loader, dependency and name space
> >>>>>>>> isolation
> >>>>>>>>>>>> issues addressed by such products as Tomcat (web apps) or
> >> Eclipse
> >>>>>>>>>>>> (plugins)? Putting user code in the same namespace as Drill
> code
> >>>>> is
> >>>>>>>>>> quick
> >>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to
> problems
> >>>>> that
> >>>>>>>>>>>> require long, frustrating debugging sessions to resolve.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing items 2
> >>>> and
> >>>>> 3
> >>>>>>>>>> are a
> >>>>>>>>>>>> big increase in scope, so I won’t be surprised if we leave
> those
> >>>>>>>> issues
> >>>>>>>>>> for
> >>>>>>>>>>>> later. (Though, addressing item 2 might be the best way to
> >>>> address
> >>>>>>>> item
> >>>>>>>>>> 1.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> If we want a very simple solution that requires minimal
> change,
> >>>>>>>> perhaps
> >>>>>>>>>> we
> >>>>>>>>>>>> can use an even simpler solution. In the proposed design, the
> >>>> user
> >>>>>>>> still
> >>>>>>>>>>>> must distribute code to all the nodes. The primary change is
> to
> >>>>> tell
> >>>>>>>>>> Drill
> >>>>>>>>>>>> to load (or unload) that code. Can accomplish the same result
> >>>>> easier
> >>>>>>>>>> simply
> >>>>>>>>>>>> by having Drill periodically scan certain directories looking
> >> for
> >>>>> new
> >>>>>>>>>> (or
> >>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve the name
> >>>> space
> >>>>>>>>>> issues,
> >>>>>>>>>>>> but will work for existing non-YARN Drill users without new
> SQL
> >>>>>>>> syntax.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Paul
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <
> >> [email protected]
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Two quick thoughts:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - (user) In the design document I didn't see any discussion
> of
> >>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to see the
> >>>>>>>> thinking
> >>>>>>>>>>>> there
> >>>>>>>>>>>>> - (dev) There is a row oriented facade via the
> >>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That would be
> a
> >>>>> good
> >>>>>>>>>> place
> >>>>>>>>>>>>> to start when trying to implement an alternative interface.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <
> >>>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I think some
> of
> >>>> the
> >>>>>>>>>> ideas
> >>>>>>>>>>>>>> around community java UDFs could be a better approach. I'd
> >> hate
> >>>>> to
> >>>>>>>>>> take
> >>>>>>>>>>>>>> away from other work to hack in something like this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <
> >>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ted refers to source code transformation. Drill gains its
> >>>> speed
> >>>>>>>> from
> >>>>>>>>>>>>>> value
> >>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the row-based
> >>>> interface
> >>>>>>>> that
> >>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are very
> type
> >>>>>>>>>> specific,
> >>>>>>>>>>>>>>> code is typically generated to handle the specifics of each
> >>>>> type.
> >>>>>>>>>>>>>> Accessing
> >>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because of the
> >>>>>>>> "impedence
> >>>>>>>>>>>>>>> mismatch" between how VVs work and the row-and-column view
> >>>>>>>> expected
> >>>>>>>>>> by
> >>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>> (non-Drill) developers.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I wonder if we've considered providing a row-oriented
> >> "facade"
> >>>>>>>> that
> >>>>>>>>>> can
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>> used by roll-your own data sources and user-defined row
> >>>>>>>> transforms?
> >>>>>>>>>>>> Might
> >>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be handy for
> >>>>> users
> >>>>>>>>>>>> willing
> >>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such a
> facade,
> >>>> the
> >>>>>>>>>> Jython
> >>>>>>>>>>>>>> row
> >>>>>>>>>>>>>>> transforms that John mentions could be quite simple.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <
> >>>>>>>> [email protected]
> >>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Since UDF's use source code transformation, using Jython
> >>>> would
> >>>>> be
> >>>>>>>>>>>>>>>> difficult.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
> >>>>>>>>>>>>>>>> [email protected]> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Charles,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't invent
> >>>>>>>> anything
> >>>>>>>>>>>>>> new,
> >>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit restart.
> But
> >>>>>>>>>>>>>>> contributions
> >>>>>>>>>>>>>>>>> are welcomed.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <
> >>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Arina,
> >>>>>>>>>>>>>>>>>> Has there been any discussion about making it possible
> via
> >>>>>>>> Jython
> >>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>> something for users to write simple UDFs in Python?
> >>>>>>>>>>>>>>>>>> My ideal would be to have this capability integrated in
> >> the
> >>>>> web
> >>>>>>>>>> GUI
> >>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python) right
> there,
> >>>>>>>> submit
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes validation
> tests.
> >>>>>>>>>>>>>>>>>> —C
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
> >>>>>>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi all!
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs support in
> >>>> Drill (
> >>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726).
> There
> >>>>> is a
> >>>>>>>>>>>>>> link
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> design document in Jira description.
> >>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Kind regards
> >>>>>>>>>>>>>>>>>>> Arina
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Dynamic UDFs support

Reply via email to