Hi list!

I am writing here to here about your experience on putting Spark ML models
into production at scale.

I know it is a very broad topic with many different faces depending on the
use-case, requirements, user base and whatever is involved in the task.
Still I'd like to open a thread about this topic that is as important as
properly training a model and I feel is often neglected.

The task is *serving web users with predictions* and the main challenge I
see is making it agile and swift.

I think there are mainly 3 general categories of such deployment that can
be described as:

   - Offline/Batch: Load a model, performs the inference, store the results
   in some datasotre (DB, indexes,...)
   - Spark in the loop: Having a long running Spark context exposed in some
   way, this include streaming as well as some custom application that wraps
   the context.
   - Use a different technology to load the Spark MLlib model and run the
   inference pipeline. I have read about MLeap and other PMML based solutions.

I would love to hear about opensource solutions and possibly without
requiring cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback,
so what would you pick? Why? and how?

Thanks!

Reply via email to