Hi Ted, Dmitry, Background: I am exploring the feasibility of providing H2O distributed "backend" to the DSL. At a high level it appears that implementing physical operators for DrmLike over H2O does not seem extremely challenging. All the operators in the DSL seem to have at least an approximate equivalent in H2O's own (R-like) DSL, and wiring one operator with another's implementation seems like a tractable problem.
The reason I write, is to better understand the split between the Mahout DSL and Spark (both current and future). As of today, the DSL seems to be pretty tightly coupled with Spark. E.g: - DSSVD.scala imports o.a.spark.storage.StorageLevel - drm.plan.CheckpointAction: the result of exec() and checkpoint() is DrmRddInput (instead of, say, DrmLike) Firstly, I don't think I am presenting some new revelation you guys don't already know - I'm sure you know that the logical vs physical "split" in the DSL is not absolute (yet). That being said, I would like to understand if there are plans, or efforts already underway to make the DSL (i.e how DSSVD would be written) and the logical layer (i.e drm.plan.* optimizer etc) more "pure" and move the Spark specific code entirely into the physical domain. I recall Dmitry mentioning that a new engine other than Spark was also being planned, therefore I deduce some thought for such "purification" has already been applied. It would be nice to see changes approximately like: Rename ./spark => ./dsl Rename ./spark/src/main/scala/org/apache/mahout/sparkbindings => ./dsl/src/main/scala/org/apache/mahout/dsl Rename ./spark/src/main/scala/org/apache/mahout/sparkbindings/blas => ./dsl/main/scala/org/apache/mahout/dsl/spark-backend along with appropriately renaming packages and imports, and confining references to RDD and SparkContext completely within spark-backend. I think such a clean split would be necessary to introduce more backend engines. If no efforts are already underway, I would be glad to take on the DSL "purification" task. Thanks, Avati
