Thank you for initiating this thread Trevor. The possibility of two Apache projects collaborating together is wonderful, and I was just trying to wrap my head around how we could do that with Mahout and MADlib. Thanks to my ignorance, I think I have more questions than answers now. :-/
The first question is how will Mahout use MADlib? Is the plan for Mahout to just expose a wrapper that will call a MADlib function internally? As suggested by you (if I understand correctly), we must either convert a Mahout vector to MADlib's convention at Mahout's or MADlib's end. But if MADlib does not have the kind of parallelization that Mahout currently has for linear algebra, then you will be limited by MADlib's capabilities right? I am assuming that Mahout's linear algebra is way more powerful than what MADlib has, especially since Mahout kind of specializes in that! But I presume what you are talking about is not such a simple wrapper. My lack of experience with Mahout/engine bindings/MapBlock just makes it harder for me to understand. The second question is about how MADlib would use Mahout's super powers. MADlib works under the principle that people don't have to move their data out of their database for analytics, but rather do it in-database. Since Mahout does not currently run on a SQL database engine, I am not sure how MADlib can leverage what Mahout is already good at (including use of GPU). I am clearly missing something here, can you please shed some light on this too? Nandish On Mon, May 22, 2017 at 12:33 PM, Trevor Grant <trevor.d.gr...@gmail.com> wrote: > Nice call out. > > So there is precedence on NOT utilizing the Mahout inCore matrix/vector > structure in Mahout Bindings- See H2O bindings. > > In this case- we let the underlying engine (in this case MADlib) utilize > its own concept of a Matrix. > > Makes quicker work of writing bindings and, since most of the deep stuff in > MADlib is CPP, I assume there's fairly good performance there anyway. > (Mahout is JVM under the hood, so with out the accelerators, performance > was not spectacular). > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Sun, May 21, 2017 at 9:05 PM, Jim Nasby <jim.na...@openscg.com> wrote: > > > On 5/21/17 7:38 PM, Trevor Grant wrote: > > > >> I don't think a PhD in math/ML is required at all for this little > venture. > >> Mainly just a knowledge of basic BLAS operations (Matrix A %*% Matrix B, > >> Matrix A %*% Vector, etc.) > >> > > > > Related to that, there's also been discussion[1] on the Postgres hackers > > list about adding a true matrix data type. Having that would allow plCUDA > > to do direct GPU matrix math with the bare minimum of fuss. > > > > Madlib would presumably need some other solution for non-postgres stuff > > (though, the matrix type could potentially be pulled into GPDB with > minimal > > fuss). > > > > 1: https://www.postgresql.org/message-id/flat/9A28C8860F777E439 > > AA12E8AEA7694F8011F52EF%40BPXM15GP.gisp.nec.co.jp > > -- > > Jim Nasby, Chief Data Architect, Austin TX > > OpenSCG http://OpenSCG.com > > >