Re: Revisiting Online serving of Spark models?

Maximiliano Felice Wed, 06 Jun 2018 14:43:30 -0700

Hi!

Do we meet at the entrance?


See you

El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <[email protected]>
escribió:

> I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it.
>
> On Sun, 3 Jun 2018 at 00:24 Holden Karau <[email protected]> wrote:
>
>> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice <
>> [email protected]> wrote:
>>
>>> Hi!
>>>
>>> We're already in San Francisco waiting for the summit. We even think
>>> that we spotted @holdenk this afternoon.
>>>
>> Unless you happened to be walking by my garage probably not super likely,
>> spent the day working on scooters/motorcycles (my style is a little less
>> unique in SF :)). Also if you see me feel free to say hi unless I look like
>> I haven't had my first coffee of the day, love chatting with folks IRL :)
>>
>>>
>>> @chris, we're really interested in the Meetup you're hosting. My team
>>> will probably join it since the beginning of you have room for us, and I'll
>>> join it later after discussing the topics on this thread. I'll send you an
>>> email regarding this request.
>>>
>>> Thanks
>>>
>>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <[email protected]>
>>> escribió:
>>>
>>>> @Chris This sounds fantastic, please send summary notes for Seattle
>>>> folks
>>>>
>>>> @Felix I work in downtown Seattle, am wondering if we should a tech
>>>> meetup around model serving in spark at my work or elsewhere close,
>>>> thoughts?  I’m actually in the midst of building microservices to manage
>>>> models and when I say models I mean much more than machine learning models
>>>> (think OR, process models as well)
>>>>
>>>> Regards
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 31, 2018, at 10:32 PM, Chris Fregly <[email protected]> wrote:
>>>>
>>>> Hey everyone!
>>>>
>>>> @Felix:  thanks for putting this together.  i sent some of you a quick
>>>> calendar event - mostly for me, so i don’t forget!  :)
>>>>
>>>> Coincidentally, this is the focus of June 6th's *Advanced Spark and
>>>> TensorFlow Meetup*
>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/>
>>>>  @5:30pm
>>>> on June 6th (same night) here in SF!
>>>>
>>>> Everybody is welcome to come.  Here’s the link to the meetup that
>>>> includes the signup link:
>>>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/*
>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/>
>>>>
>>>> We have an awesome lineup of speakers covered a lot of deep, technical
>>>> ground.
>>>>
>>>> For those who can’t attend in person, we’ll be broadcasting live - and
>>>> posting the recording afterward.
>>>>
>>>> All details are in the meetup link above…
>>>>
>>>> @holden/felix/nick/joseph/maximiliano/saikat/leif:  you’re more than
>>>> welcome to give a talk. I can move things around to make room.
>>>>
>>>> @joseph:  I’d personally like an update on the direction of the
>>>> Databricks proprietary ML Serving export format which is similar to PMML
>>>> but not a standard in any way.
>>>>
>>>> Also, the Databricks ML Serving Runtime is only available to Databricks
>>>> customers.  This seems in conflict with the community efforts described
>>>> here.  Can you comment on behalf of Databricks?
>>>>
>>>> Look forward to your response, joseph.
>>>>
>>>> See you all soon!
>>>>
>>>> —
>>>>
>>>>
>>>> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000
>>>> Users)
>>>> Organizer @ *Advanced Spark and TensorFlow Meetup*
>>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000
>>>> Global Members)
>>>>
>>>>
>>>>
>>>> *San Francisco - Chicago - Austin -
>>>> Washington DC - London - Dusseldorf *
>>>> *Try our PipelineAI Community Edition with GPUs and TPUs!!
>>>> <http://community.pipeline.ai/>*
>>>>
>>>>
>>>> On May 30, 2018, at 9:32 AM, Felix Cheung <[email protected]>
>>>> wrote:
>>>>
>>>> Hi!
>>>>
>>>> Thank you! Let’s meet then
>>>>
>>>> June 6 4pm
>>>>
>>>> Moscone West Convention Center
>>>> 800 Howard Street, San Francisco, CA 94103
>>>> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g>
>>>>
>>>> Ground floor (outside of conference area - should be available for all)
>>>> - we will meet and decide where to go
>>>>
>>>> (Would not send invite because that would be too much noise for dev@)
>>>>
>>>> To paraphrase Joseph, we will use this to kick off the discusssion and
>>>> post notes after and follow up online. As for Seattle, I would be very
>>>> interested to meet in person lateen and discuss ;)
>>>>
>>>>
>>>> _____________________________
>>>> From: Saikat Kanjilal <[email protected]>
>>>> Sent: Tuesday, May 29, 2018 11:46 AM
>>>> Subject: Re: Revisiting Online serving of Spark models?
>>>> To: Maximiliano Felice <[email protected]>
>>>> Cc: Felix Cheung <[email protected]>, Holden Karau <
>>>> [email protected]>, Joseph Bradley <[email protected]>, Leif
>>>> Walsh <[email protected]>, dev <[email protected]>
>>>>
>>>>
>>>> Would love to join but am in Seattle, thoughts on how to make this
>>>> work?
>>>>
>>>> Regards
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice <
>>>> [email protected]> wrote:
>>>>
>>>> Big +1 to a meeting with fresh air.
>>>>
>>>> Could anyone send the invites? I don't really know which is the place
>>>> Holden is talking about.
>>>>
>>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung <[email protected]>:
>>>>
>>>>> You had me at blue bottle!
>>>>>
>>>>> _____________________________
>>>>> From: Holden Karau <[email protected]>
>>>>> Sent: Tuesday, May 29, 2018 9:47 AM
>>>>> Subject: Re: Revisiting Online serving of Spark models?
>>>>> To: Felix Cheung <[email protected]>
>>>>> Cc: Saikat Kanjilal <[email protected]>, Maximiliano Felice <
>>>>> [email protected]>, Joseph Bradley <[email protected]>,
>>>>> Leif Walsh <[email protected]>, dev <[email protected]>
>>>>>
>>>>>
>>>>>
>>>>> I'm down for that, we could all go for a walk maybe to the mint plazaa
>>>>> blue bottle and grab coffee (if the weather holds have our design meeting
>>>>> outside :p)?
>>>>>
>>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Bump.
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Felix Cheung <[email protected]>
>>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM
>>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
>>>>>> *Cc:* Leif Walsh; Holden Karau; dev
>>>>>>
>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>>>>
>>>>>> Hi! How about we meet the community and discuss on June 6 4pm at
>>>>>> (near) the Summit?
>>>>>>
>>>>>> (I propose we meet at the venue entrance so we could accommodate
>>>>>> people might not be in the conference)
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Saikat Kanjilal <[email protected]>
>>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM
>>>>>> *To:* Maximiliano Felice
>>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>>>>
>>>>>> I’m in the same exact boat as Maximiliano and have use cases as well
>>>>>> for model serving and would love to join this discussion.
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I'm don't usually write a lot on this list but I keep up to date with
>>>>>> the discussions and I'm a heavy user of Spark. This topic caught my
>>>>>> attention, as we're currently facing this issue at work. I'm attending to
>>>>>> the summit and was wondering if it would it be possible for me to join 
>>>>>> that
>>>>>> meeting. I might be able to share some helpful usecases and ideas.
>>>>>>
>>>>>> Thanks,
>>>>>> Maximiliano Felice
>>>>>>
>>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <[email protected]>
>>>>>> escribió:
>>>>>>
>>>>>>> I’m with you on json being more readable than parquet, but we’ve had
>>>>>>> success using pyarrow’s parquet reader and have been quite happy with 
>>>>>>> it so
>>>>>>> far. If your target is python (and probably if not now, then soon, R), 
>>>>>>> you
>>>>>>> should look in to it.
>>>>>>>
>>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Regarding model reading and writing, I'll give quick thoughts here:
>>>>>>>> * Our approach was to use the same format but write JSON instead of
>>>>>>>> Parquet.  It's easier to parse JSON without Spark, and using the same
>>>>>>>> format simplifies architecture.  Plus, some people want to check files 
>>>>>>>> into
>>>>>>>> version control, and JSON is nice for that.
>>>>>>>> * The reader/writer APIs could be extended to take format
>>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON (and 
>>>>>>>> maybe,
>>>>>>>> eventually, handle Parquet in the online serving setting).
>>>>>>>>
>>>>>>>> This would be a big project, so proposing a SPIP might be best.  If
>>>>>>>> people are around at the Spark Summit, that could be a good time to 
>>>>>>>> meet up
>>>>>>>> & then post notes back to the dev list.
>>>>>>>>
>>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Specifically I’d like bring part of the discussion to Model and
>>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite 
>>>>>>>>> implementations
>>>>>>>>> that rely on SparkContext. This is a big blocker on reusing  trained 
>>>>>>>>> models
>>>>>>>>> outside of Spark for online serving.
>>>>>>>>>
>>>>>>>>> What’s the next step? Would folks be interested in getting
>>>>>>>>> together to discuss/get some feedback?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _____________________________
>>>>>>>>> From: Felix Cheung <[email protected]>
>>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM
>>>>>>>>> Subject: Re: Revisiting Online serving of Spark models?
>>>>>>>>> To: Holden Karau <[email protected]>, Joseph Bradley <
>>>>>>>>> [email protected]>
>>>>>>>>> Cc: dev <[email protected]>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Huge +1 on this!
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> *From:*[email protected] <[email protected]> on behalf
>>>>>>>>> of Holden Karau <[email protected]>
>>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM
>>>>>>>>> *To:* Joseph Bradley
>>>>>>>>> *Cc:* dev
>>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for bringing this up Holden!  I'm a strong supporter of
>>>>>>>>>> this.
>>>>>>>>>>
>>>>>>>>>> Awesome! I'm glad other folks think something like this belongs
>>>>>>>>> in Spark.
>>>>>>>>>
>>>>>>>>>> This was one of the original goals for mllib-local: to have local
>>>>>>>>>> versions of MLlib models which could be deployed without the big 
>>>>>>>>>> Spark JARs
>>>>>>>>>> and without a SparkContext or SparkSession.  There are related 
>>>>>>>>>> commercial
>>>>>>>>>> offerings like this : ) but the overhead of maintaining those 
>>>>>>>>>> offerings is
>>>>>>>>>> pretty high.  Building good APIs within MLlib to avoid copying logic 
>>>>>>>>>> across
>>>>>>>>>> libraries will be well worth it.
>>>>>>>>>>
>>>>>>>>>> We've talked about this need at Databricks and have also been
>>>>>>>>>> syncing with the creators of MLeap.  It'd be great to get this
>>>>>>>>>> functionality into Spark itself.  Some thoughts:
>>>>>>>>>> * It'd be valuable to have this go beyond adding transform()
>>>>>>>>>> methods taking a Row to the current Models.  Instead, it would be 
>>>>>>>>>> ideal to
>>>>>>>>>> have local, lightweight versions of models in mllib-local, outside 
>>>>>>>>>> of the
>>>>>>>>>> main mllib package (for easier deployment with smaller & fewer
>>>>>>>>>> dependencies).
>>>>>>>>>> * Supporting Pipelines is important.  For this, it would be ideal
>>>>>>>>>> to utilize elements of Spark SQL, particularly Rows and Types, which 
>>>>>>>>>> could
>>>>>>>>>> be moved into a local sql package.
>>>>>>>>>> * This architecture may require some awkward APIs currently to
>>>>>>>>>> have model prediction logic in mllib-local, local model classes in
>>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in 
>>>>>>>>>> mllib.  We
>>>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0 to
>>>>>>>>>> facilitate this architecture while making it feasible for 3rd party
>>>>>>>>>> developers to extend MLlib APIs (especially in Java).
>>>>>>>>>>
>>>>>>>>> I agree this could be interesting, and feed into the other
>>>>>>>>> discussion around when (or if) we should be considering Spark 3.0
>>>>>>>>> I _think_ we could probably do it with optional traits people
>>>>>>>>> could mix in to avoid breaking the current APIs but I could be wrong 
>>>>>>>>> on
>>>>>>>>> that point.
>>>>>>>>>
>>>>>>>>>> * It could also be worth discussing local DataFrames.  They might
>>>>>>>>>> not be as important as per-Row transformations, but they would be 
>>>>>>>>>> helpful
>>>>>>>>>> for batching for higher throughput.
>>>>>>>>>>
>>>>>>>>> That could be interesting as well.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'll be interested to hear others' thoughts too!
>>>>>>>>>>
>>>>>>>>>> Joseph
>>>>>>>>>>
>>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi y'all,
>>>>>>>>>>>
>>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like a
>>>>>>>>>>> good a time as any to revisit the online serving situation in Spark 
>>>>>>>>>>> ML. DB
>>>>>>>>>>> & other's have done some excellent working moving a lot of the 
>>>>>>>>>>> necessary
>>>>>>>>>>> tools into a local linear algebra package that doesn't depend on 
>>>>>>>>>>> having a
>>>>>>>>>>> SparkContext.
>>>>>>>>>>>
>>>>>>>>>>> There are a few different commercial and non-commercial
>>>>>>>>>>> solutions round this, but currently our individual transform/predict
>>>>>>>>>>> methods are private so they either need to copy or re-implement (or 
>>>>>>>>>>> put
>>>>>>>>>>> them selves in org.apache.spark) to access them. How would folks 
>>>>>>>>>>> feel about
>>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do 
>>>>>>>>>>> transformation of
>>>>>>>>>>> single element inputs (or local collections) that could be 
>>>>>>>>>>> optionally
>>>>>>>>>>> implemented by stages which support this? That way we can have less 
>>>>>>>>>>> copy
>>>>>>>>>>> and paste code possibly getting out of sync with our model training.
>>>>>>>>>>>
>>>>>>>>>>> I think continuing to have on-line serving grow in different
>>>>>>>>>>> projects is probably the right path, forward (folks have different 
>>>>>>>>>>> needs),
>>>>>>>>>>> but I'd love to see us make it simpler for other projects to build 
>>>>>>>>>>> reliable
>>>>>>>>>>> serving tools.
>>>>>>>>>>>
>>>>>>>>>>> I realize this maybe puts some of the folks in an awkward
>>>>>>>>>>> position with their own commercial offerings, but hopefully if we 
>>>>>>>>>>> make it
>>>>>>>>>>> easier for everyone the commercial vendors can benefit as well.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Holden :)
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Joseph Bradley
>>>>>>>>>> Software Engineer - Machine Learning
>>>>>>>>>> Databricks, Inc.
>>>>>>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Joseph Bradley
>>>>>>>> Software Engineer - Machine Learning
>>>>>>>> Databricks, Inc.
>>>>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Leif
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>

Re: Revisiting Online serving of Spark models?

Reply via email to