[
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956824#comment-13956824
]
Dmitriy Lyubimov commented on MAHOUT-1500:
------------------------------------------
bq. Now it seems to me (with my limited exploring of Mahout) that it might
actually be viable to provide a "hadoop alternative" in the form of an
alternate implementation of DistributedRowMatrix (instead of AbstractMatrix)
yes that's what i meant. On Spark side, this is done by introducing mix-ins
DrmLike, RLikeOps, RLikeDrmOps, RLikeVectorOps etc.etc. On java side, working
with mix-ins (functionality-filled traits) is of course not easy, but the
important point is that it should be an alternative hierarchy with an identical
intersection of optimized linalg operators (operator-oriented semantics in
linear algebra).
I. e. assumption is that to the end user (developer) it is more important that
notation
{code}
a dot b
{code}
means exactly the same regardless of whether a and b in-core or distributed;
but it matters significantly less whether a and b descend from Matrix or DRM,
as long as operator dot(A,B) is defined for all possible type combinations
(sparse, dense, distributed).
bq. and AbstractJob (by internally using h2o's Frame/Vec and MRTask2 APIs), and
thereby allow for a runtime choice of Hadoop vs H2O.
I care significantly less about Job api and Hadoop MR in particular. It is my
belief they are non-essential to the math user and therefore should be avoided
altogether (and such notion is eliminated in Spark Bindings)
bq. This seems like a reasonable first step?
Yes -- with caveat that logical mix-ins for distributed and in-core already
exists in Scala and Spark Bindings. Like i said, ideally mapping this logical
layer into a particular physical layer seems to be an indefinitely better
architecture to me, than creating yet-another logical layer specific to a
particular back. However, i see that it would be hard to converge on that, or
at least i don't see how. I will extract an architecture slide from my talk and
post a link to illustrate the idea a bit later.
> H2O integration
> ---------------
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
> Issue Type: Improvement
> Reporter: Anand Avati
> Fix For: 1.0
>
>
> Integration with h2o (github.com/0xdata/h2o) in order to exploit its high
> performance computational abilities.
> Start with providing implementations of AbstractMatrix and AbstractVector,
> and more as we make progress.
--
This message was sent by Atlassian JIRA
(v6.2#6252)