[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962017#comment-13962017
 ] 

Dmitriy Lyubimov commented on MAHOUT-1500:
------------------------------------------

After reviewing the newly announced https://github.com/tdunning/h2o-matrix and 
making a willful conjecture that it is what this issue is about (since it is 
still not explicitly confirmed on this Jira), I am changing my vote to -0. 

Here are components of my vote.

(1) +1 Do-ocracy -- those who willing do things, and (what is especially 
important in our case) provide continued support for it, deserve componential 
+1 to begin with.
(2) Big +1 on using h20 as external dependency. I don't think we want to be in 
business of creating, maintaining, or merging with distributed execution 
engines, we should be just translating high level ML semantics to them.
(3) +0 in-core API stability: This work must not change or deprecate in-core 
API contracts thus forcing existing mahout-math users to do unreasonable 
migration and refactoring steps and/or experience performance decline. 
Mahout-math is one of the few still very valuable components, this is 
important. (Current state of the things do not introduce such changes). 
(4) +0 in-core API augmentation. This work must not create API duplication 
(alternatives to existing contracts) or augmented API contracts that are either 
not adequately backed by the existing multitude of in-core matrix types or do 
not make sense for in-core structures.  (Current state of the things does not 
introduce such changes). 
(5)  -1 I still maintain that major Matrix and Vector in-core contracts do not 
provide adequate basis, nor are a good fit for for building shared-nothing 
generic environment. Thus, further partitioning of Matrix and Vector contract 
sets is required If distributed structures must share same hierarchy base with 
in-core ones. However, doing so will contradict positions (3) and (4)  above.  
Which is why i maintain that the least painful way to address those is to 
create a separate hierarchy base for H20Matrix which would intersect some of 
high-level algebraic contracts with in-core contracts while bearing identical 
semantics.


This concern seems to be shared even by the authors of the code, if I am not 
misinterpreting the meaning of the comments here.
{code:title="H2OMatrix.java"}
// Single-element accessors.  Calling these likely indicates a huge performance 
bug.
  @Override public double getQuick(int row, int column) { return 
_fr.vecs()[column].at(row); }
  @Override public void setQuick(int row, int column, double value) { 
_fr.vecs()[column].set(row,value); _fr.vecs()[column].
{code}

I reserve the right to change my vote if components of my vote are affected by 
future changes. 
I will not raise objections or add points based on performance.


> H2O integration
> ---------------
>
>                 Key: MAHOUT-1500
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1500
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
>
>
> Integration with h2o (github.com/0xdata/h2o) in order to exploit its high 
> performance computational abilities.
> Start with providing implementations of AbstractMatrix and AbstractVector, 
> and more as we make progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to