[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071882#comment-14071882
 ] 

ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------

Github user cliffclick commented on the pull request:

    https://github.com/apache/mahout/pull/21#issuecomment-49894450
  
    This is a very basic port, focused on correctness & completeness, with no 
effort for performance.
    Expectation Setting: There's easy 2x to 10x speedups in most of the 
operator inner loops.  The HDFS sequence-file reader/writers are 
single-threaded-single-node; H2O's internal CSV reader will be easily 100x 
faster.
    Performance work should be in later commits.
    
    Minor comments:
    Lots of places, esp reduce() calls, could/should call 
ArrayUtils.add(this,that) instead of a loop over the arrays being added.
    
    H2OHelper.empty_frame looks a ton like it should call "Vec.makeZero()" in a 
loop instead of hand rolling Vecs of zeros; there's a version which will take a 
hand-rolled layout.  This call probably should move into Frame class directly.
    
    The technique for row-labeling seems... awkward at best.  Or at least I'm 
reading that to be the purpose of using Tuple2.  I think this design needs more 
exploring - e.g. insert a row-column in front of the "normal" Frame columns, 
and teach the follow-on code to skip 1st column.  Note that many datasets have 
non-numeric cols (e.g. name, address) that cannot participate in math ops, and 
so most H2O algos already carry forward a notion of a set of columns being 
worked on.
    
    Cliff



> H2O integration
> ---------------
>
>                 Key: MAHOUT-1500
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1500
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to