+1 on simplifying the design and postpone the items Paul has suggested. Arina, Paul, I think we need to work out some of the design related to registering the UDF. Are you guys open for a quick hangout @10 a.m PDT tomorrow?
On Thu, Jul 14, 2016 at 1:46 PM, Paul Rogers <prog...@maprtech.com> wrote: > Hi All, > > We’ve had quite a lively debate in the “comments” section of Arina’s > wonderful design doc. Zelaine made a great suggestion: summarize the user > experience as a way of making sense of the wealth of detailed comments. > > IMHO, the most important user experience goals are: > > 1. When a user submits a CREATE FUNCTION command, the command returns > quickly (within a few seconds at most.) > 2. If the above user then issues a query using that function (to the same > Foreman), that query is guaranteed to successfully use the new function on > all nodes. > 3. Other users, connecting to any Foreman will see a very clean behavior > when submitting a query with the new function. Before some point in time > (can be different for each Foreman), a query with the function fails in > planning. After that point, queries are guaranteed to successfully use the > new function on all nodes. > > Basically, this says that CREATE FUNCTION can’t (potentially) take a long > time. Use of functions can’t result in random failures during the time that > the function is propagated across Drillbits. > > The goals we can perhaps postpone are: > > 1. Class name space isolation. (Allows two data scientists to define the > same class without collisions.) > 2. Function name spaces. (Allows me to define “paul.foo” and you to define > “bob.foo” with out collisions. (Needed if many people develop functions > independently. Else, we need a global name space.) > 3. Dynamic DROP FUNCTION operation. (The issues here are messy, and it > requires unloading classes and name space cleanup.) (Just let the cleanup > happen offline.) > 4. Dependency jars (e.g. third party libraries, etc.) (We require those to > be statically added to the class path before Drill starts.) > > We are not creating per-user name spaces, or allowing people to use > production clusters to try/revise functions. We’re just sampling deployment > of simple functions. > > That’s my suggestion, what do others suggest? > > Thanks, > > - Paul > > > On Jul 7, 2016, at 12:32 PM, Arina Yelchiyeva < > arina.yelchiy...@gmail.com> wrote: > > > > I also agree on using Zookeeper. I have re-worked dynamic UDF support > > document taking into account Zookeeper usage. > > > > Link to the document - > > > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > > > > Kind regards > > Arina > > > > On Tue, Jun 28, 2016 at 12:55 AM Paul Rogers <prog...@maprtech.com> > wrote: > > > >> Great idea! We already use ZK to track storage plugins. ZK is perhaps > >> better suited to register each jar and/or function that using files in > DFS. > >> Still need to work out the proper sequencing. But you are right, this is > >> the kind of thing that ZK is supposed to solve. > >> > >> - Paul > >> > >> > >>> On Jun 27, 2016, at 2:01 PM, Parth Chandra <par...@apache.org> wrote: > >>> > >>> Reading thru some of Paul's comments on maintaining a consistent state > >> for > >>> the registration of the UDF, it looks like we need a consensus protocol > >> for > >>> determining that all the Drillbits have the UDF deployed. > >>> I believe Zookeeper can provide a stronger guarantee than a 2 phase > >>> approach. Should we look into that? > >>> > >>> On Fri, Jun 24, 2016 at 10:00 AM, Arina Yelchiyeva < > >>> arina.yelchiy...@gmail.com> wrote: > >>> > >>>> Hi all! > >>>> > >>>> I have updated design document. > >>>> Main changes: > >>>> 1. Add to Drill’s config цшер the staging and registration DFS > >> locations. > >>>> 2. User is no longer is responsible for copying jars into drillbit > >> nodes. > >>>> Now user needs to copy jars into staging DFS location from where > >> drillbits > >>>> will copy them to local fs. > >>>> 2. During UDFs registration jars will be moved to DFS registration > area. > >>>> 3. During start up drillbit will copy all jars from registration area, > >> so > >>>> newly added drillbit will have all UDFs as others. > >>>> 4. Security issues - probably they will be added later as enhancement. > >>>> > >>>> More detains in the document: > >>>> > >>>> > >> > https://docs.google.com/document/d/1MluM17EKajvNP_x8U4aymcOihhUm8BMm8t_hM0jEFWk/edit > >>>> > >>>> Kind regards > >>>> Arina > >>>> > >>>> On Fri, Jun 17, 2016 at 1:25 AM Paul Rogers <prog...@maprtech.com> > >> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> To answer Arina on item 3: there is actually no good location on any > >>>> local > >>>>> node to put the UDFs. Reason: DoY allows the admin to start a > Drillbit > >> on > >>>>> any available node. When it starts, a new, fresh copy of Drill will > be > >>>>> downloaded, and this can happen after the user issued the CREATE > >> command. > >>>>> > >>>>> What we need is a shared, secure distributed storage location from > >> which > >>>>> Drillbits can download the needed jar files. Something like… DFS! > >> Indeed, > >>>>> this is how YARN stores the Drill archive from which it creates the > >> Drill > >>>>> install directory on each node. We can’t quite use YARN’s mechanism > >> (YARN > >>>>> is aware only of the files uploaded when launching an app), but we > can > >> do > >>>>> something similar. > >>>>> > >>>>> So, brainstorming a bit… > >>>>> > >>>>> 1. Store the UDF jar in a pre-defined DFS location. > >>>>> > >>>>> 2. The CREATE function 1) uploads the jar to the DFS location, and 2) > >>>>> creates some kind of registry entry. > >>>>> > >>>>> 3. The DELETE function 1) deregisters the jar (and function), but 2) > >> does > >>>>> not delete the jar (this allows in-flight queries to complete.) > >>>>> > >>>>> 3. Drillbits periodically check DFS for changed registrations, > >>>> downloading > >>>>> any needed jars. (YARN, Spark, Storm and others already do something > >>>>> similar.) > >>>>> > >>>>> 4. Registry check is “forced” when processing a query with a function > >>>> that > >>>>> is not currently registered. (Doing so resolves any possible race > >>>>> conditions.) > >>>>> > >>>>> 5. Some process (perhaps time based) removes old, unregistered jar > >> files. > >>>>> (Or, we could get fancy and use reference counts. The reference count > >>>> would > >>>>> be required if the user wants to delete, then recreate, the same > >> function > >>>>> and jar to avoid conflict with in-flight queries.) > >>>>> > >>>>> We can build security on this as follows: > >>>>> > >>>>> 1. Define permissions for who can write to the DFS location. Or, > >> indeed, > >>>>> have subdirectories by user and grant each user permission only on > >> their > >>>>> own UDF directory. > >>>>> > >>>>> 2. Provide separate registries for per-user functions (private) and > >>>> global > >>>>> functions (public). Only the admin can add global functions. But, > only > >>>> the > >>>>> user that uploads a private function can use it. > >>>>> > >>>>> 3. Leverage the Java class loader to isolate UDFs in their own name > >> space > >>>>> (see Eclipse & Tomcat for examples). That is, Drill can call into a > >> UDF, > >>>>> UDFs can call selected Drill code, but UDFs can’t shadow Drill > classes > >>>>> (accidentally or maliciously.) Plus, my function Foo won’t clash with > >>>> your > >>>>> function Foo if both are private. > >>>>> > >>>>> Sorry that this has wandered a bit far from the original simple > design, > >>>>> but the above may capture much of what folks expect in modern > >> distributed > >>>>> big data systems. > >>>>> > >>>>> I wonder if a good next step might be to review the notes in the > design > >>>>> doc, in the JIRA, and in this e-mail chain and to prepare a summary > of > >>>>> technical requirements, and a proposed design. Postpone, at least for > >>>> now, > >>>>> concerns about the amount of work; we can worry about that once folks > >>>> agree > >>>>> on your revised design. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> - Paul > >>>>> > >>>>> > >>>>>> On Jun 21, 2016, at 9:48 AM, Arina Yelchiyeva < > >>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>> > >>>>>> 4. Authorization model mentioned by Julia and John > >>>>>> If user won't have rights to copy jars to UDF classpath, which can > be > >>>>>> restricted by file system, he won't be able to do much harm by > running > >>>>>> CREATE command. If UDFs from jar were already registered, CREATE > >>>>> statement > >>>>>> will fail. CREATE OR REPLACE will just re-register UDFs. > >>>>>> But DELETE command is not safe. If user knows jar name, he can > delete > >>>> all > >>>>>> associated with it UDFs, as well as the binary and source jars. > That's > >>>>>> where we'll probably need to impose restrictions. > >>>>>> > >>>>>> On Tue, Jun 21, 2016 at 7:34 PM Arina Yelchiyeva < > >>>>> arina.yelchiy...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> 1. DELETE command - I missed to indicate it document but had it in > my > >>>>>>> mind. When user issues DELETE command, all UDF associated with > >>>> indicated > >>>>>>> jar is removed from DrillFunctionRegistry. And then binary and > source > >>>>>>> files are also deleted from UDF classpath. > >>>>>>> > >>>>>>> 2. Distribution race condition described by Paul > >>>>>>> User issues CREATE command and gets confirmation that UDFs is > >>>> registered > >>>>>>> only if all drilllbits have confirmed that registration was > >>>> successful. > >>>>>>> I don't expect user to start using UDFs in queries prior to CREATE > >>>>> command > >>>>>>> success / failure result, which is possible but strange. > >>>>>>> > >>>>>>> 3. DoY > >>>>>>> @Paul > >>>>>>> If instead of using $DRILL_HOME/jars/3rdparty/udf directly we use > >>>>>>> $DRILL_UDF environment variable which will be set during drillbit > >>>> start > >>>>>>> (like $DRILL_LOG_DIR). Location stored in this variable will be > added > >>>> to > >>>>>>> Drill classpath during start. > >>>>>>> Will it ease DoY integration somehow? > >>>>>>> > >>>>>>> Kind regards > >>>>>>> Arina > >>>>>>> > >>>>>>> On Tue, Jun 21, 2016 at 7:15 PM yuliya Feldman > >>>>> <yufeld...@yahoo.com.invalid> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Just thoughts: > >>>>>>>> You can try to reuse distributed cache Let Drill AM do the needful > >> in > >>>>>>>> terms of orchestrating UDF jars distribution. > >>>>>>>> But > >>>>>>>> I would be inclined to have a common path that is independent of > the > >>>>> fact > >>>>>>>> that it is Drill on YARN or not, as maintaining two separate ways > of > >>>>>>>> dealing with loading/unloading UDFs will be painful and error > prone. > >>>>>>>> One more note (I left a comment in the doc) - not sure about > >>>>>>>> authorization model here - we need to have some. > >>>>>>>> Just my 2cThanks > >>>>>>>> > >>>>>>>> From: Paul Rogers <prog...@maprtech.com> > >>>>>>>> To: "dev@drill.apache.org" <dev@drill.apache.org> > >>>>>>>> Sent: Monday, June 20, 2016 7:32 PM > >>>>>>>> Subject: Re: Dynamic UDFs support > >>>>>>>> > >>>>>>>> Hi Neeraja, > >>>>>>>> > >>>>>>>> The proposal calls for the user to copy the jar file to each > >> Drillbit > >>>>>>>> node. The jar would go into a new $DRILL_HOME/jars/3rdparty/udf > >>>>> directory. > >>>>>>>> > >>>>>>>> In Drill-on-YARN (DoY), YARN is responsible for copying Drill code > >> to > >>>>>>>> each node (which is good.) YARN puts that code in a location known > >>>>> only to > >>>>>>>> YARN. Since the location is private to YARN, the user can’t easily > >>>> hunt > >>>>>>>> down the location in order to add the udf jar. Even if the user > did > >>>>> find > >>>>>>>> the location, the next Drillbit to start would create a new copy > of > >>>> the > >>>>>>>> Drill software, without the udf jar. > >>>>>>>> > >>>>>>>> Second, in DoY we have separated user files from Drill software. > >> This > >>>>>>>> makes it much easier to distribute the software to each node: we > >> give > >>>>> the > >>>>>>>> Drill distribution tar archive to YARN, and YARN copies it to each > >>>>> node and > >>>>>>>> untars the Drill files. We make a separate copy of the (far > smaller) > >>>>> set of > >>>>>>>> user config files. > >>>>>>>> > >>>>>>>> If the udf jar goes into a Drill folder > >>>>> ($DRILL_HOME/jars/3rdparty/udf), > >>>>>>>> then the user would have to rebuild the Drill tar file each time > >> they > >>>>> add a > >>>>>>>> udf jar. When I tried this myself when building DoY, I found it to > >> be > >>>>> slow > >>>>>>>> and error-prone. > >>>>>>>> > >>>>>>>> So, the solution is to place the udf code in the new “site” > >>>> directory: > >>>>>>>> $DRILL_SITE/jars. That’s what that is for. Then, let DoY > >>>> automatically > >>>>>>>> distribute the code to every node. Perfect! Except that it does > not > >>>>> work to > >>>>>>>> dynamically distribute code after Drill starts. > >>>>>>>> > >>>>>>>> For DoY, the solution requirements are: > >>>>>>>> > >>>>>>>> 1. Distribute code using Drill itself, rather than manually > copying > >>>>> jars > >>>>>>>> to (unknown) Drill directories. > >>>>>>>> 2. Ensure the solution works even if another Drillbit is spun up > >>>> later, > >>>>>>>> and uses the original Drill tar file. > >>>>>>>> > >>>>>>>> I’m thinking we want to leverage DFS: place udf files into a > >>>> well-known > >>>>>>>> DFS directory. Register the udf into, say, ZK. When a new Drillbit > >>>>> starts, > >>>>>>>> it looks for new udf jars in ZK, copies the file to a temporary > >>>>> location, > >>>>>>>> and launches. An existing Drill is notified of the change and does > >>>> the > >>>>> same > >>>>>>>> download process. Clean-up is needed at some point to remove ZK > >>>>> entries if > >>>>>>>> the udf jar becomes statically available on the next launch. That > >>>> needs > >>>>>>>> more thought. > >>>>>>>> > >>>>>>>> We’d still need the phases mentioned earlier to ensure > consistency. > >>>>>>>> > >>>>>>>> Suggestions anyone as to how to do this super simply & still get > it > >>>> to > >>>>>>>> work with DoY? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> - Paul > >>>>>>>> > >>>>>>>>> On Jun 20, 2016, at 7:18 PM, Neeraja Rentachintala < > >>>>>>>> nrentachint...@maprtech.com> wrote: > >>>>>>>>> > >>>>>>>>> This will need to work with YARN (Once Drill is YARN enabled, I > >>>> would > >>>>>>>>> expect a lot of users using it in conjunction with YARN). > >>>>>>>>> Paul, I am not clear why this wouldn't work with YARN. Can you > >>>>>>>> elaborate. > >>>>>>>>> > >>>>>>>>> -Neeraja > >>>>>>>>> > >>>>>>>>> On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers < > prog...@maprtech.com > >>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Good enough, as long as we document the limitation that this > >>>> feature > >>>>>>>> can’t > >>>>>>>>>> work with YARN deployment as users generally do not have access > to > >>>>> the > >>>>>>>>>> temporary “localization” directories where the Drill code is > >> placed > >>>>> by > >>>>>>>> YARN. > >>>>>>>>>> > >>>>>>>>>> Note that the jar distribution race condition issue occurs with > >> the > >>>>>>>>>> proposed design: I believe I sketched out a scenario in one of > the > >>>>>>>> earlier > >>>>>>>>>> comments. Drillbit A receives the CREATE FUNCTION command. It > >> tells > >>>>>>>>>> Drillbit B. While informing the other Drillbits, Drillbit B > plans > >>>> and > >>>>>>>>>> launches a query that uses the function. Drillbit Z starts > >>>> execution > >>>>>>>> of the > >>>>>>>>>> query before it learns from A about the new function. This will > be > >>>>>>>> rare — > >>>>>>>>>> just rare enough to create very hard to reproduce bugs. > >>>>>>>>>> > >>>>>>>>>> The only reliable solution is to do the work in multiple passes: > >>>>>>>>>> > >>>>>>>>>> Pass 1: Ask each node to load the function, but not make it > >>>> available > >>>>>>>> to > >>>>>>>>>> the planner. (it would be available to the execution engine.) > >>>>>>>>>> Pass 2: Await confirmation from each node that this is done. > >>>>>>>>>> Pass 3: Alert every node that it is now free to plan queries > with > >>>> the > >>>>>>>>>> function. > >>>>>>>>>> > >>>>>>>>>> Finally, I wonder if we should design the SQL syntax based on a > >>>>>>>> long-term > >>>>>>>>>> design, even if the feature itself is a short-term work-around. > >>>>>>>> Changing > >>>>>>>>>> the syntax later might break scripts that users might write. > >>>>>>>>>> > >>>>>>>>>> So, the question for the group is this: is the value of > >>>> semi-complete > >>>>>>>>>> feature sufficient to justify the potential problems? > >>>>>>>>>> > >>>>>>>>>> - Paul > >>>>>>>>>> > >>>>>>>>>>> On Jun 20, 2016, at 6:15 PM, Parth Chandra < > >> pchan...@maprtech.com > >>>>> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Moving discussion to dev. > >>>>>>>>>>> > >>>>>>>>>>> I believe the aim is to do a simple implementation without the > >>>>>>>> complexity > >>>>>>>>>>> of distributing the UDF. I think the document should make this > >>>>>>>> limitation > >>>>>>>>>>> clear. > >>>>>>>>>>> > >>>>>>>>>>> Per Paul's point on there being a simpler solution of just > having > >>>>> each > >>>>>>>>>>> drillbit detect the if a UDF is present, I think the problem is > >>>> if a > >>>>>>>> UDF > >>>>>>>>>>> get's deployed to some but not all drillbits. A query can then > >>>> start > >>>>>>>>>>> executing but not run successfully. The intent of the create > >>>>> commands > >>>>>>>>>> would > >>>>>>>>>>> be to ensure that all drillbits have the UDF or none would. > >>>>>>>>>>> > >>>>>>>>>>> I think Jacques' point about ownership conflicts is not > addressed > >>>>>>>>>> clearly. > >>>>>>>>>>> Also, the unloading is not clear. The delete command should > >>>> probably > >>>>>>>>>> remove > >>>>>>>>>>> the UDF and unload it. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers < > >>>> prog...@maprtech.com > >>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Reviewed the spec; many comments posted. Three primary > comments > >>>> for > >>>>>>>> the > >>>>>>>>>>>> community to consider. > >>>>>>>>>>>> > >>>>>>>>>>>> 1. The design conflicts with the Drill-on-YARN project. Is > this > >> a > >>>>>>>>>> specific > >>>>>>>>>>>> fix for one unique problem, or is it worth expanding the > >> solution > >>>>> to > >>>>>>>>>> work > >>>>>>>>>>>> with Drill-on-YARN deployments? Might be hard to make the two > >>>> work > >>>>>>>>>> together > >>>>>>>>>>>> later. See comments in docs for details. > >>>>>>>>>>>> > >>>>>>>>>>>> 2. Have we, by chance, looked at how other projects handle > code > >>>>>>>>>>>> distribution? Spark, Storm and others automatically deploy > code > >>>>>>>> across > >>>>>>>>>> the > >>>>>>>>>>>> cluster; no manual distribution to each node. The key > difference > >>>>>>>> between > >>>>>>>>>>>> Drill and others is that, for Storm, say, code is associated > >>>> with a > >>>>>>>> job > >>>>>>>>>>>> (“topology” in Storm terms.) But, in Drill, functions are > global > >>>>> and > >>>>>>>>>> have > >>>>>>>>>>>> no obvious life cycle that suggests when the code can be > >>>> unloaded. > >>>>>>>>>>>> > >>>>>>>>>>>> 3. Have considered the class loader, dependency and name space > >>>>>>>> isolation > >>>>>>>>>>>> issues addressed by such products as Tomcat (web apps) or > >> Eclipse > >>>>>>>>>>>> (plugins)? Putting user code in the same namespace as Drill > code > >>>>> is > >>>>>>>>>> quick > >>>>>>>>>>>> & dirty. It turns out, however, that doing so leads to > problems > >>>>> that > >>>>>>>>>>>> require long, frustrating debugging sessions to resolve. > >>>>>>>>>>>> > >>>>>>>>>>>> Addressing item 1 might expand scope a bit. Addressing items 2 > >>>> and > >>>>> 3 > >>>>>>>>>> are a > >>>>>>>>>>>> big increase in scope, so I won’t be surprised if we leave > those > >>>>>>>> issues > >>>>>>>>>> for > >>>>>>>>>>>> later. (Though, addressing item 2 might be the best way to > >>>> address > >>>>>>>> item > >>>>>>>>>> 1.) > >>>>>>>>>>>> > >>>>>>>>>>>> If we want a very simple solution that requires minimal > change, > >>>>>>>> perhaps > >>>>>>>>>> we > >>>>>>>>>>>> can use an even simpler solution. In the proposed design, the > >>>> user > >>>>>>>> still > >>>>>>>>>>>> must distribute code to all the nodes. The primary change is > to > >>>>> tell > >>>>>>>>>> Drill > >>>>>>>>>>>> to load (or unload) that code. Can accomplish the same result > >>>>> easier > >>>>>>>>>> simply > >>>>>>>>>>>> by having Drill periodically scan certain directories looking > >> for > >>>>> new > >>>>>>>>>> (or > >>>>>>>>>>>> removed) jars? Still won’t work with YARN, or solve the name > >>>> space > >>>>>>>>>> issues, > >>>>>>>>>>>> but will work for existing non-YARN Drill users without new > SQL > >>>>>>>> syntax. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> > >>>>>>>>>>>> - Paul > >>>>>>>>>>>> > >>>>>>>>>>>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau < > >> jacq...@dremio.com > >>>>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Two quick thoughts: > >>>>>>>>>>>>> > >>>>>>>>>>>>> - (user) In the design document I didn't see any discussion > of > >>>>>>>>>>>>> ownership/conflicts or unloading. Would be helpful to see the > >>>>>>>> thinking > >>>>>>>>>>>> there > >>>>>>>>>>>>> - (dev) There is a row oriented facade via the > >>>>>>>>>>>>> FieldReader/FieldWriter/ComplexWriter classes. That would be > a > >>>>> good > >>>>>>>>>> place > >>>>>>>>>>>>> to start when trying to implement an alternative interface. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Jacques Nadeau > >>>>>>>>>>>>> CTO and Co-Founder, Dremio > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik < > >>>> j...@omernik.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Honestly, I don't see it as a priority issue. I think some > of > >>>> the > >>>>>>>>>> ideas > >>>>>>>>>>>>>> around community java UDFs could be a better approach. I'd > >> hate > >>>>> to > >>>>>>>>>> take > >>>>>>>>>>>>>> away from other work to hack in something like this. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers < > >>>>> prog...@maprtech.com > >>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ted refers to source code transformation. Drill gains its > >>>> speed > >>>>>>>> from > >>>>>>>>>>>>>> value > >>>>>>>>>>>>>>> vectors. However, VVs are a far cry from the row-based > >>>> interface > >>>>>>>> that > >>>>>>>>>>>>>> most > >>>>>>>>>>>>>>> mere mortals are accustomed to using. Since VVs are very > type > >>>>>>>>>> specific, > >>>>>>>>>>>>>>> code is typically generated to handle the specifics of each > >>>>> type. > >>>>>>>>>>>>>> Accessing > >>>>>>>>>>>>>>> VVs in Jython may be a bit of a challenge because of the > >>>>>>>> "impedence > >>>>>>>>>>>>>>> mismatch" between how VVs work and the row-and-column view > >>>>>>>> expected > >>>>>>>>>> by > >>>>>>>>>>>>>> most > >>>>>>>>>>>>>>> (non-Drill) developers. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I wonder if we've considered providing a row-oriented > >> "facade" > >>>>>>>> that > >>>>>>>>>> can > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>> used by roll-your own data sources and user-defined row > >>>>>>>> transforms? > >>>>>>>>>>>> Might > >>>>>>>>>>>>>>> be a hiccup in the fast VV pipeline, but might be handy for > >>>>> users > >>>>>>>>>>>> willing > >>>>>>>>>>>>>>> to trade a bit of speed for convenience. With such a > facade, > >>>> the > >>>>>>>>>> Jython > >>>>>>>>>>>>>> row > >>>>>>>>>>>>>>> transforms that John mentions could be quite simple. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning < > >>>>>>>> ted.dunn...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Since UDF's use source code transformation, using Jython > >>>> would > >>>>> be > >>>>>>>>>>>>>>>> difficult. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva < > >>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi Charles, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> not that I am aware of. Proposed solution doesn't invent > >>>>>>>> anything > >>>>>>>>>>>>>> new, > >>>>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>>> adds possibility to add UDFs without drillbit restart. > But > >>>>>>>>>>>>>>> contributions > >>>>>>>>>>>>>>>>> are welcomed. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre < > >>>>> cgi...@gmail.com > >>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Arina, > >>>>>>>>>>>>>>>>>> Has there been any discussion about making it possible > via > >>>>>>>> Jython > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>> something for users to write simple UDFs in Python? > >>>>>>>>>>>>>>>>>> My ideal would be to have this capability integrated in > >> the > >>>>> web > >>>>>>>>>> GUI > >>>>>>>>>>>>>>>> such > >>>>>>>>>>>>>>>>>> that a user could write their UDF (in Python) right > there, > >>>>>>>> submit > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>> would be deployed to Drill if it passes validation > tests. > >>>>>>>>>>>>>>>>>> —C > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva < > >>>>>>>>>>>>>>>>> arina.yelchiy...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi all! > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I have created Jira to allow dynamic UDFs support in > >>>> Drill ( > >>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726). > There > >>>>> is a > >>>>>>>>>>>>>> link > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> design document in Jira description. > >>>>>>>>>>>>>>>>>>> Comments or suggestions are welcomed. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Kind regards > >>>>>>>>>>>>>>>>>>> Arina > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>> > >> > >> > >