[
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079333#comment-14079333
]
ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------
Github user gcapan commented on the pull request:
https://github.com/apache/mahout/pull/21#issuecomment-50625655
Tests pass for me for various profiles, and the code looks good. I am a
supporter of engine-agnostic architecture and separation of actual algorithms
from backends, and multiple backends (in addition both Spark and H2O being very
promising platforms) would force us implement generic solutions for data
preprocessing, vectorization, machine learning and big data mining. In summary,
my vote is +1 for that contribution.
PS: Not H2O specific, but wanted to add here: I believe the next step
should be standardizing minimal Matrix I/O capability (i.e. a couple file
formats other than [row_id, VectorWritable] SequenceFiles) required for a
distributed computation engine, and adding data frame like structures those
allow text columns.
> H2O integration
> ---------------
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
> Issue Type: Improvement
> Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL
--
This message was sent by Atlassian JIRA
(v6.2#6252)