This will need to work with YARN (Once Drill is YARN enabled, I would expect a lot of users using it in conjunction with YARN). Paul, I am not clear why this wouldn't work with YARN. Can you elaborate.
-Neeraja On Mon, Jun 20, 2016 at 7:01 PM, Paul Rogers <prog...@maprtech.com> wrote: > Good enough, as long as we document the limitation that this feature can’t > work with YARN deployment as users generally do not have access to the > temporary “localization” directories where the Drill code is placed by YARN. > > Note that the jar distribution race condition issue occurs with the > proposed design: I believe I sketched out a scenario in one of the earlier > comments. Drillbit A receives the CREATE FUNCTION command. It tells > Drillbit B. While informing the other Drillbits, Drillbit B plans and > launches a query that uses the function. Drillbit Z starts execution of the > query before it learns from A about the new function. This will be rare — > just rare enough to create very hard to reproduce bugs. > > The only reliable solution is to do the work in multiple passes: > > Pass 1: Ask each node to load the function, but not make it available to > the planner. (it would be available to the execution engine.) > Pass 2: Await confirmation from each node that this is done. > Pass 3: Alert every node that it is now free to plan queries with the > function. > > Finally, I wonder if we should design the SQL syntax based on a long-term > design, even if the feature itself is a short-term work-around. Changing > the syntax later might break scripts that users might write. > > So, the question for the group is this: is the value of semi-complete > feature sufficient to justify the potential problems? > > - Paul > > > On Jun 20, 2016, at 6:15 PM, Parth Chandra <pchan...@maprtech.com> > wrote: > > > > Moving discussion to dev. > > > > I believe the aim is to do a simple implementation without the complexity > > of distributing the UDF. I think the document should make this limitation > > clear. > > > > Per Paul's point on there being a simpler solution of just having each > > drillbit detect the if a UDF is present, I think the problem is if a UDF > > get's deployed to some but not all drillbits. A query can then start > > executing but not run successfully. The intent of the create commands > would > > be to ensure that all drillbits have the UDF or none would. > > > > I think Jacques' point about ownership conflicts is not addressed > clearly. > > Also, the unloading is not clear. The delete command should probably > remove > > the UDF and unload it. > > > > > > On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <prog...@maprtech.com> > wrote: > > > >> Reviewed the spec; many comments posted. Three primary comments for the > >> community to consider. > >> > >> 1. The design conflicts with the Drill-on-YARN project. Is this a > specific > >> fix for one unique problem, or is it worth expanding the solution to > work > >> with Drill-on-YARN deployments? Might be hard to make the two work > together > >> later. See comments in docs for details. > >> > >> 2. Have we, by chance, looked at how other projects handle code > >> distribution? Spark, Storm and others automatically deploy code across > the > >> cluster; no manual distribution to each node. The key difference between > >> Drill and others is that, for Storm, say, code is associated with a job > >> (“topology” in Storm terms.) But, in Drill, functions are global and > have > >> no obvious life cycle that suggests when the code can be unloaded. > >> > >> 3. Have considered the class loader, dependency and name space isolation > >> issues addressed by such products as Tomcat (web apps) or Eclipse > >> (plugins)? Putting user code in the same namespace as Drill code is > quick > >> & dirty. It turns out, however, that doing so leads to problems that > >> require long, frustrating debugging sessions to resolve. > >> > >> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 > are a > >> big increase in scope, so I won’t be surprised if we leave those issues > for > >> later. (Though, addressing item 2 might be the best way to address item > 1.) > >> > >> If we want a very simple solution that requires minimal change, perhaps > we > >> can use an even simpler solution. In the proposed design, the user still > >> must distribute code to all the nodes. The primary change is to tell > Drill > >> to load (or unload) that code. Can accomplish the same result easier > simply > >> by having Drill periodically scan certain directories looking for new > (or > >> removed) jars? Still won’t work with YARN, or solve the name space > issues, > >> but will work for existing non-YARN Drill users without new SQL syntax. > >> > >> Thanks, > >> > >> - Paul > >> > >>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <jacq...@dremio.com> > wrote: > >>> > >>> Two quick thoughts: > >>> > >>> - (user) In the design document I didn't see any discussion of > >>> ownership/conflicts or unloading. Would be helpful to see the thinking > >> there > >>> - (dev) There is a row oriented facade via the > >>> FieldReader/FieldWriter/ComplexWriter classes. That would be a good > place > >>> to start when trying to implement an alternative interface. > >>> > >>> > >>> -- > >>> Jacques Nadeau > >>> CTO and Co-Founder, Dremio > >>> > >>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <j...@omernik.com> > wrote: > >>> > >>>> Honestly, I don't see it as a priority issue. I think some of the > ideas > >>>> around community java UDFs could be a better approach. I'd hate to > take > >>>> away from other work to hack in something like this. > >>>> > >>>> > >>>> > >>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <prog...@maprtech.com> > >> wrote: > >>>> > >>>>> Ted refers to source code transformation. Drill gains its speed from > >>>> value > >>>>> vectors. However, VVs are a far cry from the row-based interface that > >>>> most > >>>>> mere mortals are accustomed to using. Since VVs are very type > specific, > >>>>> code is typically generated to handle the specifics of each type. > >>>> Accessing > >>>>> VVs in Jython may be a bit of a challenge because of the "impedence > >>>>> mismatch" between how VVs work and the row-and-column view expected > by > >>>> most > >>>>> (non-Drill) developers. > >>>>> > >>>>> I wonder if we've considered providing a row-oriented "facade" that > can > >>>> be > >>>>> used by roll-your own data sources and user-defined row transforms? > >> Might > >>>>> be a hiccup in the fast VV pipeline, but might be handy for users > >> willing > >>>>> to trade a bit of speed for convenience. With such a facade, the > Jython > >>>> row > >>>>> transforms that John mentions could be quite simple. > >>>>> > >>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <ted.dunn...@gmail.com > > > >>>>> wrote: > >>>>> > >>>>>> Since UDF's use source code transformation, using Jython would be > >>>>>> difficult. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva < > >>>>>> arina.yelchiy...@gmail.com> wrote: > >>>>>> > >>>>>>> Hi Charles, > >>>>>>> > >>>>>>> not that I am aware of. Proposed solution doesn't invent anything > >>>> new, > >>>>>> just > >>>>>>> adds possibility to add UDFs without drillbit restart. But > >>>>> contributions > >>>>>>> are welcomed. > >>>>>>> > >>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <cgi...@gmail.com> > >>>>> wrote: > >>>>>>> > >>>>>>>> Arina, > >>>>>>>> Has there been any discussion about making it possible via Jython > >>>> or > >>>>>>>> something for users to write simple UDFs in Python? > >>>>>>>> My ideal would be to have this capability integrated in the web > GUI > >>>>>> such > >>>>>>>> that a user could write their UDF (in Python) right there, submit > >>>> it > >>>>>> and > >>>>>>> it > >>>>>>>> would be deployed to Drill if it passes validation tests. > >>>>>>>> —C > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva < > >>>>>>> arina.yelchiy...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi all! > >>>>>>>>> > >>>>>>>>> I have created Jira to allow dynamic UDFs support in Drill ( > >>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726). There is a > >>>> link > >>>>>> to > >>>>>>>>> design document in Jira description. > >>>>>>>>> Comments or suggestions are welcomed. > >>>>>>>>> > >>>>>>>>> Kind regards > >>>>>>>>> Arina > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > >> > >