I created the following JIRAs: https://issues.apache.org/jira/browse/APEXMALHAR-2260 https://issues.apache.org/jira/browse/APEXMALHAR-2261
On Wed, Sep 21, 2016 at 11:10 PM, Chinmay Kolhatkar <chin...@datatorrent.com > wrote: > I would like to help in contributing to this feature. > > On Wed, Sep 21, 2016 at 12:26 AM, Sasha Parfenov <sas...@apache.org> > wrote: > > > +1 on both executing Python code in an operator and high level API for > > constructing Pipelines in Python. > > > > There is a large user base of engineers and data scientists which use > > Python on regular basis for crunching through big data. Providing them > > with a powerful new platform for big data processing, wrapped in a > familiar > > language, will open Apex to a much broader user base and help grow the > > project. > > > > Given the potentially new user base of Python developers, it may make > sense > > to prioritize the high level API for pipeline construction. This will > > allow users to build simple applications with existing library operators, > > and we can get feedback on what areas they would like to see improved > next > > - custom Python operator support or more built-in library operators. > > > > Thanks, > > Sasha > > > > On Thu, Sep 15, 2016 at 2:06 PM, Thomas Weise <t...@apache.org> wrote: > > > > > Hi, > > > > > > Python (not Jython) seems to be a popular language and frequently used > > for > > > data analysis, especially where flexibility matters. It has a > > comprehensive > > > library and it is generally considered low barrier to entry. I have > also > > > seen Python used in critical back-end components, although that's > > probably > > > not very common? > > > > > > I think Python support could potentially expand the user base for Apex. > > > There are 2 main areas that can be considered: > > > > > > 1) Support to execute Python code through an operator > > > 2) A client API that lets users construct pipelines in Python > > > > > > The former can exist without the latter. And it would enable users to > > > leverage existing code that otherwise would have to be rewritten in a > JVM > > > language. The engine could ship scripts/packages so they are > > automatically > > > distributed on the cluster. > > > > > > A useful client API probably requires back-end support for lambda > > functions > > > and more complex UDFs. > > > > > > Would be great to get some feedback, especially from those that have > > > experience with Python, on how an integration could potentially open up > > new > > > use cases for Apex. > > > > > > Thanks, > > > Thomas > > > > > >