Mahout DSL vs Spark

Anand Avati Sun, 27 Apr 2014 19:59:16 -0700

Hi Ted, Dmitry,
Background: I am exploring the feasibility of providing H2O distributed
"backend" to the DSL. At a high level it appears that implementing physical
operators for DrmLike over H2O does not seem extremely challenging. All the
operators in the DSL seem to have at least an approximate equivalent in
H2O's own (R-like) DSL, and wiring one operator with another's
implementation seems like a tractable problem.


The reason I write, is to better understand the split between the Mahout
DSL and Spark (both current and future). As of today, the DSL seems to be
pretty tightly coupled with Spark.

E.g:

- DSSVD.scala imports o.a.spark.storage.StorageLevel
- drm.plan.CheckpointAction: the result of exec() and checkpoint() is
DrmRddInput (instead of, say, DrmLike)

Firstly, I don't think I am presenting some new revelation you guys don't
already know - I'm sure you know that the logical vs physical "split" in
the DSL is not absolute (yet).

That being said, I would like to understand if there are plans, or efforts
already underway to make the DSL (i.e how DSSVD would be written) and the
logical layer (i.e drm.plan.* optimizer etc) more "pure" and move the Spark
specific code entirely into the physical domain. I recall Dmitry mentioning
that a new engine other than Spark was also being planned, therefore I
deduce some thought for such "purification" has already been applied.

It would be nice to see changes approximately like:

Rename ./spark => ./dsl
Rename ./spark/src/main/scala/org/apache/mahout/sparkbindings =>
./dsl/src/main/scala/org/apache/mahout/dsl
Rename ./spark/src/main/scala/org/apache/mahout/sparkbindings/blas =>
./dsl/main/scala/org/apache/mahout/dsl/spark-backend

along with appropriately renaming packages and imports, and confining
references to RDD and SparkContext completely within spark-backend.

I think such a clean split would be necessary to introduce more backend
engines. If no efforts are already underway, I would be glad to take on the
DSL "purification" task.

Thanks,
Avati

Mahout DSL vs Spark

Reply via email to