Hi,

Does anyone have interest to support MADLib in Drill?  MADLib is an open
source in-database analytics library written with C/C++ and Python.
 Impala, HAWQ both have a port for MADLib.
There is a JIRA for this:
https://issues.apache.org/jira/browse/DRILL-325

I have personal interest in this project and would like to make it as a
google summer of code project. Anyone interested in working as a mentor for
GSoC for this?

I've written a proposal here:

http://bitly.com/MADDrill
Your comments are highly appreciated.

In summary, the tasks include:

- implement MADLib C++ Abstraction Layer for Drill
- generate Java Wrapper code for C++ UDF/UDA
- write python driver code the drive the module

I'd like to take the initial step to developing framework to support MADLib
in Drill.

There are still several problems needed to work out.  For example, as
Jacques pointed in Comments Drill-325, we need to find a way the working
with the workspace variable with only support internal type for aggregation.

In addition, for your convenience, you can find the GSoC mentor information
for Apache here.
http://community.apache.org/gsoc.html

Disclaimer: I am a graduate student currently doing an internship at Simba
(developing ODBC Driver for Drill :) ). This is for my personal interest. I
will mostly do it in weekend and off-work.

Best,

Xiao

Reply via email to