Re: Dynamic UDFs support

Paul Rogers Mon, 20 Jun 2016 19:02:47 -0700

Good enough, as long as we document the limitation that this feature can’t work 
with YARN deployment as users generally do not have access to the temporary 
“localization” directories where the Drill code is placed by YARN.


Note that the jar distribution race condition issue occurs with the proposed 
design: I believe I sketched out a scenario in one of the earlier comments. 
Drillbit A receives the CREATE FUNCTION command. It tells Drillbit B. While 
informing the other Drillbits, Drillbit B plans and launches a query that uses 
the function. Drillbit Z starts execution of the query before it learns from A 
about the new function. This will be rare — just rare enough to create very 
hard to reproduce bugs.

The only reliable solution is to do the work in multiple passes:

Pass 1: Ask each node to load the function, but not make it available to the 
planner. (it would be available to the execution engine.)
Pass 2: Await confirmation from each node that this is done.
Pass 3: Alert every node that it is now free to plan queries with the function.

Finally, I wonder if we should design the SQL syntax based on a long-term 
design, even if the feature itself is a short-term work-around. Changing the 
syntax later might break scripts that users might write.

So, the question for the group is this: is the value of semi-complete feature 
sufficient to justify the potential problems?

- Paul

> On Jun 20, 2016, at 6:15 PM, Parth Chandra <[email protected]> wrote:
> 
> Moving discussion to dev.
> 
> I believe the aim is to do a simple implementation without the complexity
> of distributing the UDF. I think the document should make this limitation
> clear.
> 
> Per Paul's point on there being a simpler solution of just having each
> drillbit detect the if a UDF is present, I think the problem is if a UDF
> get's deployed to some but not all drillbits. A query can then start
> executing but not run successfully. The intent of the create commands would
> be to ensure that all drillbits have the UDF or none would.
> 
> I think Jacques' point about ownership conflicts is not addressed clearly.
> Also, the unloading is not clear. The delete command should probably remove
> the UDF and unload it.
> 
> 
> On Fri, Jun 17, 2016 at 11:19 AM, Paul Rogers <[email protected]> wrote:
> 
>> Reviewed the spec; many comments posted. Three primary comments for the
>> community to consider.
>> 
>> 1. The design conflicts with the Drill-on-YARN project. Is this a specific
>> fix for one unique problem, or is it worth expanding the solution to work
>> with Drill-on-YARN deployments? Might be hard to make the two work together
>> later. See comments in docs for details.
>> 
>> 2. Have we, by chance, looked at how other projects handle code
>> distribution? Spark, Storm and others automatically deploy code across the
>> cluster; no manual distribution to each node. The key difference between
>> Drill and others is that, for Storm, say, code is associated with a job
>> (“topology” in Storm terms.) But, in Drill, functions are global and have
>> no obvious life cycle that suggests when the code can be unloaded.
>> 
>> 3. Have considered the class loader, dependency and name space isolation
>> issues addressed by such products as Tomcat (web apps) or Eclipse
>> (plugins)? Putting user code in the same namespace as Drill code  is quick
>> & dirty. It turns out, however, that doing so leads to problems that
>> require long, frustrating debugging sessions to resolve.
>> 
>> Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a
>> big increase in scope, so I won’t be surprised if we leave those issues for
>> later. (Though, addressing item 2 might be the best way to address item 1.)
>> 
>> If we want a very simple solution that requires minimal change, perhaps we
>> can use an even simpler solution. In the proposed design, the user still
>> must distribute code to all the nodes. The primary change is to tell Drill
>> to load (or unload) that code. Can accomplish the same result easier simply
>> by having Drill periodically scan certain directories looking for new (or
>> removed) jars? Still won’t work with YARN, or solve the name space issues,
>> but will work for existing non-YARN Drill users without new SQL syntax.
>> 
>> Thanks,
>> 
>> - Paul
>> 
>>> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <[email protected]> wrote:
>>> 
>>> Two quick thoughts:
>>> 
>>> - (user) In the design document I didn't see any discussion of
>>> ownership/conflicts or unloading. Would be helpful to see the thinking
>> there
>>> - (dev) There is a row oriented facade via the
>>> FieldReader/FieldWriter/ComplexWriter classes. That would be a good place
>>> to start when trying to implement an alternative interface.
>>> 
>>> 
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>> 
>>> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <[email protected]> wrote:
>>> 
>>>> Honestly, I don't see it as a priority issue. I think some of the ideas
>>>> around community java UDFs could be a better approach. I'd hate to take
>>>> away from other work to hack in something like this.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <[email protected]>
>> wrote:
>>>> 
>>>>> Ted refers to source code transformation. Drill gains its speed from
>>>> value
>>>>> vectors. However, VVs are a far cry from the row-based interface that
>>>> most
>>>>> mere mortals are accustomed to using. Since VVs are very type specific,
>>>>> code is typically generated to handle the specifics of each type.
>>>> Accessing
>>>>> VVs in Jython may be a bit of a challenge because of the "impedence
>>>>> mismatch" between how VVs work and the row-and-column view expected by
>>>> most
>>>>> (non-Drill) developers.
>>>>> 
>>>>> I wonder if we've considered providing a row-oriented "facade" that can
>>>> be
>>>>> used by roll-your own data sources and user-defined row transforms?
>> Might
>>>>> be a hiccup in the fast VV pipeline, but might be handy for users
>> willing
>>>>> to trade a bit of speed for convenience. With such a facade, the Jython
>>>> row
>>>>> transforms that John mentions could be quite simple.
>>>>> 
>>>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Since UDF's use source code transformation, using Jython would be
>>>>>> difficult.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> Hi Charles,
>>>>>>> 
>>>>>>> not that I am aware of. Proposed solution doesn't invent anything
>>>> new,
>>>>>> just
>>>>>>> adds possibility to add UDFs without drillbit restart. But
>>>>> contributions
>>>>>>> are welcomed.
>>>>>>> 
>>>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Arina,
>>>>>>>> Has there been any discussion about making it possible via Jython
>>>> or
>>>>>>>> something for users to write simple UDFs in Python?
>>>>>>>> My ideal would be to have this capability integrated in the web GUI
>>>>>> such
>>>>>>>> that a user could write their UDF (in Python) right there, submit
>>>> it
>>>>>> and
>>>>>>> it
>>>>>>>> would be deployed to Drill if it passes validation tests.
>>>>>>>> —C
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi all!
>>>>>>>>> 
>>>>>>>>> I have created Jira to allow dynamic UDFs support in Drill (
>>>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726). There is a
>>>> link
>>>>>> to
>>>>>>>>> design document in Jira description.
>>>>>>>>> Comments or suggestions are welcomed.
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> Arina
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Dynamic UDFs support

Reply via email to