Re: Follow up from ApacheCon

Nandish Jayaram Thu, 25 May 2017 15:50:39 -0700

Thank you for initiating this thread Trevor.
The possibility of two Apache projects collaborating together is wonderful,
and I was just trying to wrap my head around how we could do that with
Mahout
and MADlib. Thanks to my ignorance, I think I have more questions than
answers now. :-/

The first question is how will Mahout use MADlib? Is the plan for Mahout to
just expose a wrapper that will call a MADlib function internally? As
suggested
by you (if I understand correctly), we must either convert a Mahout vector
to
MADlib's convention at Mahout's or MADlib's end. But if MADlib does not
have the kind of parallelization that Mahout currently has for linear
algebra,
then you will be limited by MADlib's capabilities right? I am assuming that
Mahout's linear algebra is way more powerful than what MADlib has,
especially
since Mahout kind of specializes in that! But I presume what you
are talking about is not such a simple wrapper. My lack of experience with
Mahout/engine bindings/MapBlock just makes it harder for me to understand.

The second question is about how MADlib would use Mahout's super powers.
MADlib works under the principle that people don't have to move their data
out of their database for analytics, but rather do it in-database. Since
Mahout does
not currently run on a SQL database engine, I am not sure how MADlib can
leverage
what Mahout is already good at (including use of GPU). I am clearly missing
something here, can you please shed some light on this too?

Nandish

On Mon, May 22, 2017 at 12:33 PM, Trevor Grant <trevor.d.gr...@gmail.com>
wrote:

> Nice call out.
>
> So there is precedence on NOT utilizing the Mahout inCore matrix/vector
> structure in Mahout Bindings- See H2O bindings.
>
> In this case- we let the underlying engine (in this case MADlib) utilize
> its own concept of a Matrix.
>
> Makes quicker work of writing bindings and, since most of the deep stuff in
> MADlib is CPP, I assume there's fairly good performance there anyway.
> (Mahout is JVM under the hood, so with out the accelerators, performance
> was not spectacular).
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Sun, May 21, 2017 at 9:05 PM, Jim Nasby <jim.na...@openscg.com> wrote:
>
> > On 5/21/17 7:38 PM, Trevor Grant wrote:
> >
> >> I don't think a PhD in math/ML is required at all for this little
> venture.
> >> Mainly just a knowledge of basic BLAS operations (Matrix A %*% Matrix B,
> >> Matrix A %*% Vector, etc.)
> >>
> >
> > Related to that, there's also been discussion[1] on the Postgres hackers
> > list about adding a true matrix data type. Having that would allow plCUDA
> > to do direct GPU matrix math with the bare minimum of fuss.
> >
> > Madlib would presumably need some other solution for non-postgres stuff
> > (though, the matrix type could potentially be pulled into GPDB with
> minimal
> > fuss).
> >
> > 1: https://www.postgresql.org/message-id/flat/9A28C8860F777E439
> > AA12E8AEA7694F8011F52EF%40BPXM15GP.gisp.nec.co.jp
> > --
> > Jim Nasby, Chief Data Architect, Austin TX
> > OpenSCG                 http://OpenSCG.com
> >
>

Re: Follow up from ApacheCon

Reply via email to