Indrajith,

Thank you for your interest in joining the MADlib project.

Based on your followup questions I'm guessing the question is more about
"how do I understand the code of the project" and less "what are the
mechanics of contributing" to an Apache project.


You might want to start by reading over the design documents on our
community site: http://madlib.incubator.apache.org/community.html

Some of that material may still need migration to the new incubator wiki,
if you want to help out with that it would be a great way to get started
with the project.  Some of the material may be a bit out of date, and
working on the migration of information to the new wiki could be a good way
of working both through the changes as well as helping to understand the
content.

Overall you can think of MADlib as having a couple different major
components:
1. The python driver functions
2. The C++ implementations functions
3. The C++ database abstraction layer


1. Python driver functions

The driver functions are mostly located in the subdirectories under
https://github.com/apache/incubator-madlib/tree/master/src/ports/postgres/modules

These functions are the main entry point from user input and are largely
responsible for the flow control of the algorithms.  Generally the
implementations consist of validating input parameters, executing sql
statements, evaluating the results and potentially looping to execute more
sql statements until some convergence criteria has been hit.

2. The C++ implementation functions

Mostly located under
https://github.com/apache/incubator-madlib/tree/master/src/modules

These functions are the C++ definitions of the core functions and
aggregates needed for particular algorithms.  These are implemented in C++
rather than python for performance reasons.

3. The C++ database abstraction layer

Mostly located under:
https://github.com/apache/incubator-madlib/tree/master/src/dbal
and
https://github.com/apache/incubator-madlib/tree/master/src/ports/postgres/dbconnector

These functions attempt to provide a programming interface that abstracts
all the Postgres internal details away and provides a mechanism whereby
MADlib can support different backend platforms and focus on the internal
functionality rather than the platform integration logic.


Finally... Ask questions, we're happy to help you start getting familiar
with the project.

Cheers,
  Caleb


On Fri, Oct 23, 2015 at 9:00 AM, Indrajith Udayakumara <[email protected]>
wrote:

> We are a team of undergraduates. We are interested to study madlib,
> so we need your help. How can we understand this from the begining?
> can we have some references? We would like to contribute this project.
>

Reply via email to