Hi Drillers,
I am starting this chain to discuss the potential approaches for supporting
Drill Python UDF's.

I have certain questions and wonder if someone has suggestions for them ?

I have been looking in how Pig devs are doing this currently - but clearly
our scenario is lot more complicated.

This is how Pig achieves scripting support:
- Different ScriptEngines for all scripting languages
- A Java UDF (as a template) which overrides the necessary methods and
internally invokes python functions
- convert python datatypes to pig datatypes and return back to Pig
execution.


While creating templates for Drill UDF's would be simple (something similar
to [1]), but working with Holders is one of the challenge that we would
face.
Challenges in Drill:
- We do not return data in Drill UDF's rather have all manipulations on the
Value Holders
- We have multiple methods in UDF/UDAF that need to be overridden so we
cannot have a simple python function - rather we need something like a
python class [2]. The Annotation info can be added to class itself.
- Number of @Workspace/@Output/@Input variables are difficult to track
unless we force the user to follow some sort of rule to declare all
variables planned to be used.

Please let me know your opinion on the same and how we can address these
challenges.

Thanks

[1]: https://gist.github.com/yssharma/bded704f4e5c992a4e66
[2]: https://gist.github.com/yssharma/5ee8e4ab9437c807c601

Reply via email to