[ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079333#comment-14079333
 ] 

ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------

Github user gcapan commented on the pull request:

    https://github.com/apache/mahout/pull/21#issuecomment-50625655
  
    Tests pass for me for various profiles, and the code looks good. I am a 
supporter of engine-agnostic architecture and separation of actual algorithms 
from backends, and multiple backends (in addition both Spark and H2O being very 
promising platforms) would force us implement generic solutions for data 
preprocessing, vectorization, machine learning and big data mining. In summary, 
my vote is +1 for that contribution. 
    
    PS: Not H2O specific, but wanted to add here: I believe the next step 
should be standardizing minimal Matrix I/O capability (i.e. a couple file 
formats other than [row_id, VectorWritable] SequenceFiles) required for a 
distributed computation engine, and adding data frame like structures those 
allow text columns.  


> H2O integration
> ---------------
>
>                 Key: MAHOUT-1500
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1500
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to