[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982497#comment-13982497
 ] 

Ted Dunning commented on MAHOUT-1500:
-------------------------------------

[[email protected]]'s comments have several incorrect statements which lead to 
incorrect conclusions.

These statements are both explicit and implicit and include in paraphrased form:

* A comment about a "performance bug" means that h2o can't implement the Matrix 
API

This means that use of some operations may have impacts on performance that 
could be surprisingly large to some programmers.  The comment is intended to 
warn implementors that these impacts could be large enough to essentially 
prevent benefit from parallel computation.  As such, their use would thwart 
some of the purpose of using a parallel system.  The reference to a 
"performance bug" does not imply that the operations do not work and, indeed, 
their availability might be handy during initial implementation of algorithms.

Section (A) makes points about validity of abstractions due to the requirements 
to modify existing code, but that really doesn't apply since that isn't the 
purpose of the current work.

* It is the intent of the h2o support of the Matrix API that all codes that use 
the Matrix API should run and get parallel speedup

This is explicitly not a goal of the current effort.  The goal of the current 
effort is to use a well understood and stable Mahout API to experiment with 
implementation techniques for parallel algorithms that are based on h2o.  It is 
a premise of this effort that the operations used in these hand built 
implementations will have roughly similar execution patterns as will equivalent 
programs that use the Scala bindings or the distributed DSL bindings.  That 
premise is unlikely to be massively incorrect and thus the current effort is 
useful in terms of determining good h2o idioms for implementing matrix code.

The pattern of usage of the matrix API by other Mahout codes is completely 
irrelevant to this effort.

* The h2o system is not rich enough in capabilities to support things like 
zipping identically distributed data sets.

This is simply incorrect and is based on lack of knowledge of the h2o system.  
The h2o primitives are different from Spark primitives.  That means that 
different idioms have to be used to generate similar results, but it doesn't 
mean that h2o lacks these capabilities.  In particular, the discord between 
what [[email protected]] thinks that h2o can do and what it can do is large 
enough that the entire section (C) in his comments is essentially vacuous since 
it is based entirely on false premises.

The current results indicate that there considerable promise for h2o in terms 
of these capabilities.  More work is indicated.

* the current work would require massive revamping of the current Mahout Matrix 
API.

The current work is a technical exploration of convenient and efficient 
implementation techniques.  It has no implications whatsoever regarding the 
refactoring of the Mahout Matrix API.  The current work does have implications 
relative to any h2o shim layers that might ultimately be necessary, but that 
has nothing to do with the current Mahout in-core API's.  Section (B) is thus 
also moot.

The emotional tenor of [[email protected]]'s comments are exactly what is 
encouraging the h2o work to be done a bit apart.  It simply isn't efficient to 
have to answer so many off-topic points whenever any reports on work in 
progress are given.






> H2O integration
> ---------------
>
>                 Key: MAHOUT-1500
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1500
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
>
>
> Integration with h2o (github.com/0xdata/h2o) in order to exploit its high 
> performance computational abilities.
> Start with providing implementations of AbstractMatrix and AbstractVector, 
> and more as we make progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to