[
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059083#comment-14059083
]
ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------
Github user pferrel commented on the pull request:
https://github.com/apache/mahout/pull/21#issuecomment-48761123
Exactly, thanks. I see you've done the same for CF also great.
But this illustrates the problem. I need to change 50% of the tests in CF
cooccurrence because they were not catching a bug. Now the tests live in two
places h2o and spark. And unless I change the tests in both places the build
will break. The files look virtually identical except for the imports, which is
good. If that's true, I wonder if we could we use a Scala macro to keep the
code all in one file? We might be able to take the same code and produce two
artifacts that are both run at build time. That would reduce the load on devs
for this kind of thing.
However currently almost all IO code is spark specific. You must have
re-implemented drm.writeDrm for h2o. Until this is **not** a re-implementation
but is engine neutral we are going to have a growing problem. I am the only
person currently working in spark specific land and only Dmitriy and Sebastian
are writing for V2. When other committers get past the Scala barrier and start
committing similar stuff they will immediately face this.
BTW I am very interested in seeing how h2o ItemSimilarityDriver compares to
an h2o version. IMO this is the kind of motivation we have to see. If you
implemented the driver or the reader/writers we could compare speed on h2o and
spark. we have a large enough dataset to make it interesting.
> H2O integration
> ---------------
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
> Issue Type: Improvement
> Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL
--
This message was sent by Atlassian JIRA
(v6.2#6252)