There are two things we(Yandex) miss in Spark: MLlib good abstractions and
good workflow job scheduler. From threads "Adding abstraction in MlLib" and
"[mllib] State of Multi-Model training" I got the idea, that databricks
working on it and we should wait until first post doc, which would lead us.
What about workflow scheduler? Is there anyone already working on it? Does
anyone have a plan on doing it?

P.S. We thought that MLlib abstractions about multiple algorithms run with
same data would need such scheduler, which would rerun algorithm in case of
failure. I understand, that spark provide fault tolerance out of the box,
but we found some "Ooozie-like" scheduler more reliable for such long
living workflows.

-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*

Reply via email to