[
https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059226#comment-14059226
]
ASF GitHub Bot commented on MAHOUT-1500:
----------------------------------------
Github user avati commented on the pull request:
https://github.com/apache/mahout/pull/21#issuecomment-48771908
On Fri, Jul 11, 2014 at 11:46 AM, Pat Ferrel <[email protected]>
wrote:
> So you don't see how changing the drm API or storage format will now break
> code in two places written for two different engines?
>
Changing DRM API? Yes, of course - that is the nature of the beast of
supporting multiple implementations behind a single abstraction. Change in
abstraction API will need corresponding change in all backends. That's the
reason why APIs must be designed carefully so that future changes to them
are estimated to be most minimum. I don't see how this by itself qualifies
as an objection.
Storage format? Neither spark nor h2o is defining any storage formats. The
current APIs read and write to sequence files whose formats are very well
defined and standardized. As far the they both read and write that common
format from engine neutral locations, I don't see any problems at all.
If I make the change to drm I can fix spark breakage but not h2o. This bit
> of code is extremely stable and super simple for spark so may be a bad
> example but new code will not be so stable just the opposite. For each new
> IO operation (SparkContext dependent) or engine tuning (SparkConf
> dependent) we will grow the problem. The core will become untouchable or
> breakage will happen in places one engineer will not be able to fix.
>
Can you please provide a more concrete example for both "make change do
drm" and "new IO operation (SparkContext dependent)"? It is hard for me to
visualize the problems you are foreseeing without more specifics.
This is a real issue, I need to change code in math-scala today, already
> have but it isn't pushed. Who knows what that will break in h2o
> implementations? I will be changing cooccurrence tests, so have to make
> them in two places. Maybe I can do that but when they diverge further than
> this example I won't be able to.
>
Well, as long as you are fixing a bug in cf logic, that should be engine
independent. However if you are adding a new DRM API or modifying an
existing DRM API - that will need corresponding changes in all the engines.
There's no getting around that. That's something we all have to live with,
no matter what project it is.
> You guys need to address these issues as if you were supporting two
> engines for all Mahout code or you will never see what Mahout committers
> problems will be.
>
As I said before, please provide a concrete example of what the issues are.
I don't know *what* to fix yet.
Thanks
> H2O integration
> ---------------
>
> Key: MAHOUT-1500
> URL: https://issues.apache.org/jira/browse/MAHOUT-1500
> Project: Mahout
> Issue Type: Improvement
> Reporter: Anand Avati
> Fix For: 1.0
>
>
> Provide H2O backend for the Mahout DSL
--
This message was sent by Atlassian JIRA
(v6.2#6252)