[jira] [Commented] (MAHOUT-1500) H2O integration

ASF GitHub Bot (JIRA) Fri, 11 Jul 2014 10:48:20 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059083#comment-14059083
 ]


ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------

Github user pferrel commented on the pull request:

    https://github.com/apache/mahout/pull/21#issuecomment-48761123
  
    Exactly, thanks. I see you've done the same for CF also great.
    
    But this illustrates the problem. I need to change 50% of the tests in CF 
cooccurrence because they were not catching a bug. Now the tests live in two 
places h2o and spark. And unless I change the tests in both places the build 
will break. The files look virtually identical except for the imports, which is 
good. If that's true, I wonder if we could we use a Scala macro to keep the 
code all in one file? We might be able to take the same code and produce two 
artifacts that are both run at build time. That would reduce the load on devs 
for this kind of thing. 
    
    However currently almost all IO code is spark specific. You must have 
re-implemented drm.writeDrm for h2o.  Until this is **not** a re-implementation 
but is engine neutral we are going to have a growing problem. I am the only 
person currently working in spark specific land and only Dmitriy and Sebastian 
are writing for V2. When other committers get past the Scala barrier and start 
committing similar stuff they will immediately face this. 
    
    BTW I am very interested in seeing how h2o ItemSimilarityDriver compares to 
an h2o version. IMO this is the kind of motivation we have to see. If you 
implemented the driver or the reader/writers we could compare speed on h2o and 
spark. we have a large enough dataset to make it interesting.


> H2O integration
> ---------------
>
>                 Key: MAHOUT-1500
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1500
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Anand Avati
>             Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1500) H2O integration

Reply via email to