On Mon, Jan 6, 2014 at 12:40 PM, Oleksandr Olgashko < [email protected]> wrote:
> Returning back to question about theme to work, asked 2 months ago. > What algorithm should I implement? Smth like FastICA or Infomax? Should it > look like scikit version or there are some specific mahout restrictions? > most importantly, scikit versions are non-distributed (are they?). One of primary challenges for Mahout is to develop and validate efficient parallelization strategy. > > > Btw, are there any plans to make python bindings for mahout, or this is a > bad idea? Seems strange to have many different libs that do, roughly > I'd say it is a bad idea for now. We are trying to stay within a JVM process and pass everything by-reference. If you are interested in distributed python, i know there are some ideas flying on spark forum on integrating scikit and pySpark. (My primary gripe about python is that it is not strongly typed, compile-time-validated, parameter-wise language which makes things a bit uneasy with tons of parameters being passed to scikit --- but then i never really deep-dived into python). But we are trying to move things closer to Scala DSL (to make things more readable, like this example [1]; and perhaps running things on Spark instead of pure map reduce to make things a bit more iterable/optimizable. That approach may present a good appeal to your longing of python-style clarity . [1] https://github.com/apache/mahout/blob/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala speaking, same things. >
