Hi Drillers, I am starting this chain to discuss the potential approaches for supporting Drill Python UDF's.
I have certain questions and wonder if someone has suggestions for them ? I have been looking in how Pig devs are doing this currently - but clearly our scenario is lot more complicated. This is how Pig achieves scripting support: - Different ScriptEngines for all scripting languages - A Java UDF (as a template) which overrides the necessary methods and internally invokes python functions - convert python datatypes to pig datatypes and return back to Pig execution. While creating templates for Drill UDF's would be simple (something similar to [1]), but working with Holders is one of the challenge that we would face. Challenges in Drill: - We do not return data in Drill UDF's rather have all manipulations on the Value Holders - We have multiple methods in UDF/UDAF that need to be overridden so we cannot have a simple python function - rather we need something like a python class [2]. The Annotation info can be added to class itself. - Number of @Workspace/@Output/@Input variables are difficult to track unless we force the user to follow some sort of rule to declare all variables planned to be used. Please let me know your opinion on the same and how we can address these challenges. Thanks [1]: https://gist.github.com/yssharma/bded704f4e5c992a4e66 [2]: https://gist.github.com/yssharma/5ee8e4ab9437c807c601
