I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. On Sun, 3 Jun 2018 at 00:24 Holden Karau <[email protected]> wrote:
> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < > [email protected]> wrote: > >> Hi! >> >> We're already in San Francisco waiting for the summit. We even think that >> we spotted @holdenk this afternoon. >> > Unless you happened to be walking by my garage probably not super likely, > spent the day working on scooters/motorcycles (my style is a little less > unique in SF :)). Also if you see me feel free to say hi unless I look like > I haven't had my first coffee of the day, love chatting with folks IRL :) > >> >> @chris, we're really interested in the Meetup you're hosting. My team >> will probably join it since the beginning of you have room for us, and I'll >> join it later after discussing the topics on this thread. I'll send you an >> email regarding this request. >> >> Thanks >> >> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <[email protected]> >> escribió: >> >>> @Chris This sounds fantastic, please send summary notes for Seattle >>> folks >>> >>> @Felix I work in downtown Seattle, am wondering if we should a tech >>> meetup around model serving in spark at my work or elsewhere close, >>> thoughts? I’m actually in the midst of building microservices to manage >>> models and when I say models I mean much more than machine learning models >>> (think OR, process models as well) >>> >>> Regards >>> >>> Sent from my iPhone >>> >>> On May 31, 2018, at 10:32 PM, Chris Fregly <[email protected]> wrote: >>> >>> Hey everyone! >>> >>> @Felix: thanks for putting this together. i sent some of you a quick >>> calendar event - mostly for me, so i don’t forget! :) >>> >>> Coincidentally, this is the focus of June 6th's *Advanced Spark and >>> TensorFlow Meetup* >>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>> @5:30pm >>> on June 6th (same night) here in SF! >>> >>> Everybody is welcome to come. Here’s the link to the meetup that >>> includes the signup link: >>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* >>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>> >>> We have an awesome lineup of speakers covered a lot of deep, technical >>> ground. >>> >>> For those who can’t attend in person, we’ll be broadcasting live - and >>> posting the recording afterward. >>> >>> All details are in the meetup link above… >>> >>> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than >>> welcome to give a talk. I can move things around to make room. >>> >>> @joseph: I’d personally like an update on the direction of the >>> Databricks proprietary ML Serving export format which is similar to PMML >>> but not a standard in any way. >>> >>> Also, the Databricks ML Serving Runtime is only available to Databricks >>> customers. This seems in conflict with the community efforts described >>> here. Can you comment on behalf of Databricks? >>> >>> Look forward to your response, joseph. >>> >>> See you all soon! >>> >>> — >>> >>> >>> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000 >>> Users) >>> Organizer @ *Advanced Spark and TensorFlow Meetup* >>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000 >>> Global Members) >>> >>> >>> >>> *San Francisco - Chicago - Austin - Washington DC - London - Dusseldorf >>> * >>> *Try our PipelineAI Community Edition with GPUs and TPUs!! >>> <http://community.pipeline.ai/>* >>> >>> >>> On May 30, 2018, at 9:32 AM, Felix Cheung <[email protected]> >>> wrote: >>> >>> Hi! >>> >>> Thank you! Let’s meet then >>> >>> June 6 4pm >>> >>> Moscone West Convention Center >>> 800 Howard Street, San Francisco, CA 94103 >>> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g> >>> >>> Ground floor (outside of conference area - should be available for all) >>> - we will meet and decide where to go >>> >>> (Would not send invite because that would be too much noise for dev@) >>> >>> To paraphrase Joseph, we will use this to kick off the discusssion and >>> post notes after and follow up online. As for Seattle, I would be very >>> interested to meet in person lateen and discuss ;) >>> >>> >>> _____________________________ >>> From: Saikat Kanjilal <[email protected]> >>> Sent: Tuesday, May 29, 2018 11:46 AM >>> Subject: Re: Revisiting Online serving of Spark models? >>> To: Maximiliano Felice <[email protected]> >>> Cc: Felix Cheung <[email protected]>, Holden Karau < >>> [email protected]>, Joseph Bradley <[email protected]>, Leif >>> Walsh <[email protected]>, dev <[email protected]> >>> >>> >>> Would love to join but am in Seattle, thoughts on how to make this work? >>> >>> Regards >>> >>> Sent from my iPhone >>> >>> On May 29, 2018, at 10:35 AM, Maximiliano Felice < >>> [email protected]> wrote: >>> >>> Big +1 to a meeting with fresh air. >>> >>> Could anyone send the invites? I don't really know which is the place >>> Holden is talking about. >>> >>> 2018-05-29 14:27 GMT-03:00 Felix Cheung <[email protected]>: >>> >>>> You had me at blue bottle! >>>> >>>> _____________________________ >>>> From: Holden Karau <[email protected]> >>>> Sent: Tuesday, May 29, 2018 9:47 AM >>>> Subject: Re: Revisiting Online serving of Spark models? >>>> To: Felix Cheung <[email protected]> >>>> Cc: Saikat Kanjilal <[email protected]>, Maximiliano Felice < >>>> [email protected]>, Joseph Bradley <[email protected]>, >>>> Leif Walsh <[email protected]>, dev <[email protected]> >>>> >>>> >>>> >>>> I'm down for that, we could all go for a walk maybe to the mint plazaa >>>> blue bottle and grab coffee (if the weather holds have our design meeting >>>> outside :p)? >>>> >>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung < >>>> [email protected]> wrote: >>>> >>>>> Bump. >>>>> >>>>> ------------------------------ >>>>> *From:* Felix Cheung <[email protected]> >>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM >>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley >>>>> *Cc:* Leif Walsh; Holden Karau; dev >>>>> >>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>> >>>>> Hi! How about we meet the community and discuss on June 6 4pm at >>>>> (near) the Summit? >>>>> >>>>> (I propose we meet at the venue entrance so we could accommodate >>>>> people might not be in the conference) >>>>> >>>>> ------------------------------ >>>>> *From:* Saikat Kanjilal <[email protected]> >>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM >>>>> *To:* Maximiliano Felice >>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev >>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>> >>>>> I’m in the same exact boat as Maximiliano and have use cases as well >>>>> for model serving and would love to join this discussion. >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice < >>>>> [email protected]> wrote: >>>>> >>>>> Hi! >>>>> >>>>> I'm don't usually write a lot on this list but I keep up to date with >>>>> the discussions and I'm a heavy user of Spark. This topic caught my >>>>> attention, as we're currently facing this issue at work. I'm attending to >>>>> the summit and was wondering if it would it be possible for me to join >>>>> that >>>>> meeting. I might be able to share some helpful usecases and ideas. >>>>> >>>>> Thanks, >>>>> Maximiliano Felice >>>>> >>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <[email protected]> >>>>> escribió: >>>>> >>>>>> I’m with you on json being more readable than parquet, but we’ve had >>>>>> success using pyarrow’s parquet reader and have been quite happy with it >>>>>> so >>>>>> far. If your target is python (and probably if not now, then soon, R), >>>>>> you >>>>>> should look in to it. >>>>>> >>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Regarding model reading and writing, I'll give quick thoughts here: >>>>>>> * Our approach was to use the same format but write JSON instead of >>>>>>> Parquet. It's easier to parse JSON without Spark, and using the same >>>>>>> format simplifies architecture. Plus, some people want to check files >>>>>>> into >>>>>>> version control, and JSON is nice for that. >>>>>>> * The reader/writer APIs could be extended to take format parameters >>>>>>> (just like DataFrame reader/writers) to handle JSON (and maybe, >>>>>>> eventually, >>>>>>> handle Parquet in the online serving setting). >>>>>>> >>>>>>> This would be a big project, so proposing a SPIP might be best. If >>>>>>> people are around at the Spark Summit, that could be a good time to >>>>>>> meet up >>>>>>> & then post notes back to the dev list. >>>>>>> >>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Specifically I’d like bring part of the discussion to Model and >>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite >>>>>>>> implementations >>>>>>>> that rely on SparkContext. This is a big blocker on reusing trained >>>>>>>> models >>>>>>>> outside of Spark for online serving. >>>>>>>> >>>>>>>> What’s the next step? Would folks be interested in getting together >>>>>>>> to discuss/get some feedback? >>>>>>>> >>>>>>>> >>>>>>>> _____________________________ >>>>>>>> From: Felix Cheung <[email protected]> >>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM >>>>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>>>> To: Holden Karau <[email protected]>, Joseph Bradley < >>>>>>>> [email protected]> >>>>>>>> Cc: dev <[email protected]> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Huge +1 on this! >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:*[email protected] <[email protected]> on behalf >>>>>>>> of Holden Karau <[email protected]> >>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM >>>>>>>> *To:* Joseph Bradley >>>>>>>> *Cc:* dev >>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks for bringing this up Holden! I'm a strong supporter of >>>>>>>>> this. >>>>>>>>> >>>>>>>>> Awesome! I'm glad other folks think something like this belongs in >>>>>>>> Spark. >>>>>>>> >>>>>>>>> This was one of the original goals for mllib-local: to have local >>>>>>>>> versions of MLlib models which could be deployed without the big >>>>>>>>> Spark JARs >>>>>>>>> and without a SparkContext or SparkSession. There are related >>>>>>>>> commercial >>>>>>>>> offerings like this : ) but the overhead of maintaining those >>>>>>>>> offerings is >>>>>>>>> pretty high. Building good APIs within MLlib to avoid copying logic >>>>>>>>> across >>>>>>>>> libraries will be well worth it. >>>>>>>>> >>>>>>>>> We've talked about this need at Databricks and have also been >>>>>>>>> syncing with the creators of MLeap. It'd be great to get this >>>>>>>>> functionality into Spark itself. Some thoughts: >>>>>>>>> * It'd be valuable to have this go beyond adding transform() >>>>>>>>> methods taking a Row to the current Models. Instead, it would be >>>>>>>>> ideal to >>>>>>>>> have local, lightweight versions of models in mllib-local, outside of >>>>>>>>> the >>>>>>>>> main mllib package (for easier deployment with smaller & fewer >>>>>>>>> dependencies). >>>>>>>>> * Supporting Pipelines is important. For this, it would be ideal >>>>>>>>> to utilize elements of Spark SQL, particularly Rows and Types, which >>>>>>>>> could >>>>>>>>> be moved into a local sql package. >>>>>>>>> * This architecture may require some awkward APIs currently to >>>>>>>>> have model prediction logic in mllib-local, local model classes in >>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in mllib. >>>>>>>>> We >>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0 to >>>>>>>>> facilitate this architecture while making it feasible for 3rd party >>>>>>>>> developers to extend MLlib APIs (especially in Java). >>>>>>>>> >>>>>>>> I agree this could be interesting, and feed into the other >>>>>>>> discussion around when (or if) we should be considering Spark 3.0 >>>>>>>> I _think_ we could probably do it with optional traits people could >>>>>>>> mix in to avoid breaking the current APIs but I could be wrong on that >>>>>>>> point. >>>>>>>> >>>>>>>>> * It could also be worth discussing local DataFrames. They might >>>>>>>>> not be as important as per-Row transformations, but they would be >>>>>>>>> helpful >>>>>>>>> for batching for higher throughput. >>>>>>>>> >>>>>>>> That could be interesting as well. >>>>>>>> >>>>>>>>> >>>>>>>>> I'll be interested to hear others' thoughts too! >>>>>>>>> >>>>>>>>> Joseph >>>>>>>>> >>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi y'all, >>>>>>>>>> >>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like a >>>>>>>>>> good a time as any to revisit the online serving situation in Spark >>>>>>>>>> ML. DB >>>>>>>>>> & other's have done some excellent working moving a lot of the >>>>>>>>>> necessary >>>>>>>>>> tools into a local linear algebra package that doesn't depend on >>>>>>>>>> having a >>>>>>>>>> SparkContext. >>>>>>>>>> >>>>>>>>>> There are a few different commercial and non-commercial solutions >>>>>>>>>> round this, but currently our individual transform/predict methods >>>>>>>>>> are >>>>>>>>>> private so they either need to copy or re-implement (or put them >>>>>>>>>> selves in >>>>>>>>>> org.apache.spark) to access them. How would folks feel about adding >>>>>>>>>> a new >>>>>>>>>> trait for ML pipeline stages to expose to do transformation of single >>>>>>>>>> element inputs (or local collections) that could be optionally >>>>>>>>>> implemented >>>>>>>>>> by stages which support this? That way we can have less copy and >>>>>>>>>> paste code >>>>>>>>>> possibly getting out of sync with our model training. >>>>>>>>>> >>>>>>>>>> I think continuing to have on-line serving grow in different >>>>>>>>>> projects is probably the right path, forward (folks have different >>>>>>>>>> needs), >>>>>>>>>> but I'd love to see us make it simpler for other projects to build >>>>>>>>>> reliable >>>>>>>>>> serving tools. >>>>>>>>>> >>>>>>>>>> I realize this maybe puts some of the folks in an awkward >>>>>>>>>> position with their own commercial offerings, but hopefully if we >>>>>>>>>> make it >>>>>>>>>> easier for everyone the commercial vendors can benefit as well. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Holden :) >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Joseph Bradley >>>>>>>>> Software Engineer - Machine Learning >>>>>>>>> Databricks, Inc. >>>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Joseph Bradley >>>>>>> Software Engineer - Machine Learning >>>>>>> Databricks, Inc. >>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>> >>>>>> -- >>>>>> -- >>>>>> Cheers, >>>>>> Leif >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> >>>> >>>> >>> >>> >>> >>> > > > -- > Twitter: https://twitter.com/holdenkarau >
