Please comment in the JIRA/SPIP if you are interested! We can see the community 
support for a proposal like this.


________________________________
From: Pola Yao <pola....@gmail.com>
Sent: Wednesday, January 23, 2019 8:01 AM
To: Riccardo Ferrari
Cc: Felix Cheung; User
Subject: Re: I have trained a ML model, now what?

Hi Riccardo,

Right now, Spark does not support low-latency predictions in Production. MLeap 
is an alternative and it's been used in many scenarios. But it's good to see 
that Spark Community has decided to provide such support.

On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari 
<ferra...@gmail.com<mailto:ferra...@gmail.com>> wrote:
Felix, thank you very much for the link. Much appreciated.

The attached PDF is very interesting, I found myself evaluating many of the 
scenarios described in Q3. It's unfortunate the proposal is not being worked 
on, would be great to see that part of the code base.

It is cool to see big players like Uber trying to make Open Source better, 
thanks!


On Tue, Jan 22, 2019 at 5:24 PM Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
About deployment/serving

SPIP
https://issues.apache.org/jira/browse/SPARK-26247


________________________________
From: Riccardo Ferrari <ferra...@gmail.com<mailto:ferra...@gmail.com>>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?

Hi list!

I am writing here to here about your experience on putting Spark ML models into 
production at scale.

I know it is a very broad topic with many different faces depending on the 
use-case, requirements, user base and whatever is involved in the task. Still 
I'd like to open a thread about this topic that is as important as properly 
training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is 
making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be 
described as:

  *   Offline/Batch: Load a model, performs the inference, store the results in 
some datasotre (DB, indexes,...)
  *   Spark in the loop: Having a long running Spark context exposed in some 
way, this include streaming as well as some custom application that wraps the 
context.
  *   Use a different technology to load the Spark MLlib model and run the 
inference pipeline. I have read about MLeap and other PMML based solutions.

I would love to hear about opensource solutions and possibly without requiring 
cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so 
what would you pick? Why? and how?

Thanks!

Reply via email to