For committers who still have time/ are working on Mahout: Since we know that current performance is function of in-core performance, how about throwing in Lapack-based and GPU based backs under in-core Matrix api?
Breeze has already moved to do both (it supports jBlas/Lapack for dense matrices, and has a separate experimental GPU module as an add-on). on Lapack side, it could be directly integrated. for GPU side, there is this thing to look at https://github.com/BIDData/BIDMat which could be integrated directly, or merge-ported (similar to Colt effort). It even has a bit quirky matrix dsl for scala. perhaps some additional ideas how it may all work together, could also be had from looking at a sister project BidMach -- i haven't studied it extensively, it looks like an attempt to standardize learning process. Before long, both techniques are to be a new norm for distributed computation systems. Last chance not to fall behind. Any takers?
