+1 on both executing Python code in an operator and high level API for constructing Pipelines in Python.
There is a large user base of engineers and data scientists which use Python on regular basis for crunching through big data. Providing them with a powerful new platform for big data processing, wrapped in a familiar language, will open Apex to a much broader user base and help grow the project. Given the potentially new user base of Python developers, it may make sense to prioritize the high level API for pipeline construction. This will allow users to build simple applications with existing library operators, and we can get feedback on what areas they would like to see improved next - custom Python operator support or more built-in library operators. Thanks, Sasha On Thu, Sep 15, 2016 at 2:06 PM, Thomas Weise <t...@apache.org> wrote: > Hi, > > Python (not Jython) seems to be a popular language and frequently used for > data analysis, especially where flexibility matters. It has a comprehensive > library and it is generally considered low barrier to entry. I have also > seen Python used in critical back-end components, although that's probably > not very common? > > I think Python support could potentially expand the user base for Apex. > There are 2 main areas that can be considered: > > 1) Support to execute Python code through an operator > 2) A client API that lets users construct pipelines in Python > > The former can exist without the latter. And it would enable users to > leverage existing code that otherwise would have to be rewritten in a JVM > language. The engine could ship scripts/packages so they are automatically > distributed on the cluster. > > A useful client API probably requires back-end support for lambda functions > and more complex UDFs. > > Would be great to get some feedback, especially from those that have > experience with Python, on how an integration could potentially open up new > use cases for Apex. > > Thanks, > Thomas >