On Mon, Jan 6, 2014 at 12:40 PM, Oleksandr Olgashko <
[email protected]> wrote:

> Returning back to question about theme to work, asked 2 months ago.
> What algorithm should I implement? Smth like FastICA or Infomax? Should it
> look like scikit version or there are some specific mahout restrictions?
>

most importantly, scikit versions are non-distributed (are they?). One of
primary challenges for Mahout is to develop and validate efficient
parallelization strategy.


>
>
> Btw, are there any plans to make python bindings for mahout, or this is a
> bad idea? Seems strange to have many different libs that do, roughly
>

I'd say it is a bad idea for now. We are trying to stay within a JVM
process  and pass everything by-reference. If you are interested in
distributed python, i know there are some ideas flying on spark forum on
integrating  scikit and pySpark. (My primary gripe about python is that it
is not strongly typed, compile-time-validated, parameter-wise language
which makes things a bit uneasy with tons of parameters being passed to
scikit --- but then i never really deep-dived into python).

But we are trying to move things closer to Scala DSL (to make things more
readable,  like this example [1]; and perhaps running things on Spark
instead of pure map reduce to make things a bit more iterable/optimizable.
That approach may present a good appeal to your longing of python-style
clarity .

[1]
https://github.com/apache/mahout/blob/trunk/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/SSVD.scala

speaking, same things.
>

Reply via email to