Hi! Do we meet at the entrance?
See you El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath <nick.pentre...@gmail.com> escribió: > I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. > > On Sun, 3 Jun 2018 at 00:24 Holden Karau <hol...@pigscanfly.ca> wrote: > >> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < >> maximilianofel...@gmail.com> wrote: >> >>> Hi! >>> >>> We're already in San Francisco waiting for the summit. We even think >>> that we spotted @holdenk this afternoon. >>> >> Unless you happened to be walking by my garage probably not super likely, >> spent the day working on scooters/motorcycles (my style is a little less >> unique in SF :)). Also if you see me feel free to say hi unless I look like >> I haven't had my first coffee of the day, love chatting with folks IRL :) >> >>> >>> @chris, we're really interested in the Meetup you're hosting. My team >>> will probably join it since the beginning of you have room for us, and I'll >>> join it later after discussing the topics on this thread. I'll send you an >>> email regarding this request. >>> >>> Thanks >>> >>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal <sxk1...@hotmail.com> >>> escribió: >>> >>>> @Chris This sounds fantastic, please send summary notes for Seattle >>>> folks >>>> >>>> @Felix I work in downtown Seattle, am wondering if we should a tech >>>> meetup around model serving in spark at my work or elsewhere close, >>>> thoughts? I’m actually in the midst of building microservices to manage >>>> models and when I say models I mean much more than machine learning models >>>> (think OR, process models as well) >>>> >>>> Regards >>>> >>>> Sent from my iPhone >>>> >>>> On May 31, 2018, at 10:32 PM, Chris Fregly <ch...@fregly.com> wrote: >>>> >>>> Hey everyone! >>>> >>>> @Felix: thanks for putting this together. i sent some of you a quick >>>> calendar event - mostly for me, so i don’t forget! :) >>>> >>>> Coincidentally, this is the focus of June 6th's *Advanced Spark and >>>> TensorFlow Meetup* >>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>>> @5:30pm >>>> on June 6th (same night) here in SF! >>>> >>>> Everybody is welcome to come. Here’s the link to the meetup that >>>> includes the signup link: >>>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/* >>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924195/> >>>> >>>> We have an awesome lineup of speakers covered a lot of deep, technical >>>> ground. >>>> >>>> For those who can’t attend in person, we’ll be broadcasting live - and >>>> posting the recording afterward. >>>> >>>> All details are in the meetup link above… >>>> >>>> @holden/felix/nick/joseph/maximiliano/saikat/leif: you’re more than >>>> welcome to give a talk. I can move things around to make room. >>>> >>>> @joseph: I’d personally like an update on the direction of the >>>> Databricks proprietary ML Serving export format which is similar to PMML >>>> but not a standard in any way. >>>> >>>> Also, the Databricks ML Serving Runtime is only available to Databricks >>>> customers. This seems in conflict with the community efforts described >>>> here. Can you comment on behalf of Databricks? >>>> >>>> Look forward to your response, joseph. >>>> >>>> See you all soon! >>>> >>>> — >>>> >>>> >>>> *Chris Fregly *Founder @ *PipelineAI* <https://pipeline.ai/> (100,000 >>>> Users) >>>> Organizer @ *Advanced Spark and TensorFlow Meetup* >>>> <https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/> (85,000 >>>> Global Members) >>>> >>>> >>>> >>>> *San Francisco - Chicago - Austin - >>>> Washington DC - London - Dusseldorf * >>>> *Try our PipelineAI Community Edition with GPUs and TPUs!! >>>> <http://community.pipeline.ai/>* >>>> >>>> >>>> On May 30, 2018, at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com> >>>> wrote: >>>> >>>> Hi! >>>> >>>> Thank you! Let’s meet then >>>> >>>> June 6 4pm >>>> >>>> Moscone West Convention Center >>>> 800 Howard Street, San Francisco, CA 94103 >>>> <https://maps.google.com/?q=800+Howard+Street,+San+Francisco,+CA+94103&entry=gmail&source=g> >>>> >>>> Ground floor (outside of conference area - should be available for all) >>>> - we will meet and decide where to go >>>> >>>> (Would not send invite because that would be too much noise for dev@) >>>> >>>> To paraphrase Joseph, we will use this to kick off the discusssion and >>>> post notes after and follow up online. As for Seattle, I would be very >>>> interested to meet in person lateen and discuss ;) >>>> >>>> >>>> _____________________________ >>>> From: Saikat Kanjilal <sxk1...@hotmail.com> >>>> Sent: Tuesday, May 29, 2018 11:46 AM >>>> Subject: Re: Revisiting Online serving of Spark models? >>>> To: Maximiliano Felice <maximilianofel...@gmail.com> >>>> Cc: Felix Cheung <felixcheun...@hotmail.com>, Holden Karau < >>>> hol...@pigscanfly.ca>, Joseph Bradley <jos...@databricks.com>, Leif >>>> Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >>>> >>>> >>>> Would love to join but am in Seattle, thoughts on how to make this >>>> work? >>>> >>>> Regards >>>> >>>> Sent from my iPhone >>>> >>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice < >>>> maximilianofel...@gmail.com> wrote: >>>> >>>> Big +1 to a meeting with fresh air. >>>> >>>> Could anyone send the invites? I don't really know which is the place >>>> Holden is talking about. >>>> >>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung <felixcheun...@hotmail.com>: >>>> >>>>> You had me at blue bottle! >>>>> >>>>> _____________________________ >>>>> From: Holden Karau <hol...@pigscanfly.ca> >>>>> Sent: Tuesday, May 29, 2018 9:47 AM >>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>> To: Felix Cheung <felixcheun...@hotmail.com> >>>>> Cc: Saikat Kanjilal <sxk1...@hotmail.com>, Maximiliano Felice < >>>>> maximilianofel...@gmail.com>, Joseph Bradley <jos...@databricks.com>, >>>>> Leif Walsh <leif.wa...@gmail.com>, dev <dev@spark.apache.org> >>>>> >>>>> >>>>> >>>>> I'm down for that, we could all go for a walk maybe to the mint plazaa >>>>> blue bottle and grab coffee (if the weather holds have our design meeting >>>>> outside :p)? >>>>> >>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung < >>>>> felixcheun...@hotmail.com> wrote: >>>>> >>>>>> Bump. >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Felix Cheung <felixcheun...@hotmail.com> >>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM >>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley >>>>>> *Cc:* Leif Walsh; Holden Karau; dev >>>>>> >>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>> >>>>>> Hi! How about we meet the community and discuss on June 6 4pm at >>>>>> (near) the Summit? >>>>>> >>>>>> (I propose we meet at the venue entrance so we could accommodate >>>>>> people might not be in the conference) >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Saikat Kanjilal <sxk1...@hotmail.com> >>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM >>>>>> *To:* Maximiliano Felice >>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev >>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>> >>>>>> I’m in the same exact boat as Maximiliano and have use cases as well >>>>>> for model serving and would love to join this discussion. >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice < >>>>>> maximilianofel...@gmail.com> wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> I'm don't usually write a lot on this list but I keep up to date with >>>>>> the discussions and I'm a heavy user of Spark. This topic caught my >>>>>> attention, as we're currently facing this issue at work. I'm attending to >>>>>> the summit and was wondering if it would it be possible for me to join >>>>>> that >>>>>> meeting. I might be able to share some helpful usecases and ideas. >>>>>> >>>>>> Thanks, >>>>>> Maximiliano Felice >>>>>> >>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <leif.wa...@gmail.com> >>>>>> escribió: >>>>>> >>>>>>> I’m with you on json being more readable than parquet, but we’ve had >>>>>>> success using pyarrow’s parquet reader and have been quite happy with >>>>>>> it so >>>>>>> far. If your target is python (and probably if not now, then soon, R), >>>>>>> you >>>>>>> should look in to it. >>>>>>> >>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley <jos...@databricks.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Regarding model reading and writing, I'll give quick thoughts here: >>>>>>>> * Our approach was to use the same format but write JSON instead of >>>>>>>> Parquet. It's easier to parse JSON without Spark, and using the same >>>>>>>> format simplifies architecture. Plus, some people want to check files >>>>>>>> into >>>>>>>> version control, and JSON is nice for that. >>>>>>>> * The reader/writer APIs could be extended to take format >>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON (and >>>>>>>> maybe, >>>>>>>> eventually, handle Parquet in the online serving setting). >>>>>>>> >>>>>>>> This would be a big project, so proposing a SPIP might be best. If >>>>>>>> people are around at the Spark Summit, that could be a good time to >>>>>>>> meet up >>>>>>>> & then post notes back to the dev list. >>>>>>>> >>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung < >>>>>>>> felixcheun...@hotmail.com> wrote: >>>>>>>> >>>>>>>>> Specifically I’d like bring part of the discussion to Model and >>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite >>>>>>>>> implementations >>>>>>>>> that rely on SparkContext. This is a big blocker on reusing trained >>>>>>>>> models >>>>>>>>> outside of Spark for online serving. >>>>>>>>> >>>>>>>>> What’s the next step? Would folks be interested in getting >>>>>>>>> together to discuss/get some feedback? >>>>>>>>> >>>>>>>>> >>>>>>>>> _____________________________ >>>>>>>>> From: Felix Cheung <felixcheun...@hotmail.com> >>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM >>>>>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>>>>> To: Holden Karau <hol...@pigscanfly.ca>, Joseph Bradley < >>>>>>>>> jos...@databricks.com> >>>>>>>>> Cc: dev <dev@spark.apache.org> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Huge +1 on this! >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:*holden.ka...@gmail.com <holden.ka...@gmail.com> on behalf >>>>>>>>> of Holden Karau <hol...@pigscanfly.ca> >>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM >>>>>>>>> *To:* Joseph Bradley >>>>>>>>> *Cc:* dev >>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley < >>>>>>>>> jos...@databricks.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks for bringing this up Holden! I'm a strong supporter of >>>>>>>>>> this. >>>>>>>>>> >>>>>>>>>> Awesome! I'm glad other folks think something like this belongs >>>>>>>>> in Spark. >>>>>>>>> >>>>>>>>>> This was one of the original goals for mllib-local: to have local >>>>>>>>>> versions of MLlib models which could be deployed without the big >>>>>>>>>> Spark JARs >>>>>>>>>> and without a SparkContext or SparkSession. There are related >>>>>>>>>> commercial >>>>>>>>>> offerings like this : ) but the overhead of maintaining those >>>>>>>>>> offerings is >>>>>>>>>> pretty high. Building good APIs within MLlib to avoid copying logic >>>>>>>>>> across >>>>>>>>>> libraries will be well worth it. >>>>>>>>>> >>>>>>>>>> We've talked about this need at Databricks and have also been >>>>>>>>>> syncing with the creators of MLeap. It'd be great to get this >>>>>>>>>> functionality into Spark itself. Some thoughts: >>>>>>>>>> * It'd be valuable to have this go beyond adding transform() >>>>>>>>>> methods taking a Row to the current Models. Instead, it would be >>>>>>>>>> ideal to >>>>>>>>>> have local, lightweight versions of models in mllib-local, outside >>>>>>>>>> of the >>>>>>>>>> main mllib package (for easier deployment with smaller & fewer >>>>>>>>>> dependencies). >>>>>>>>>> * Supporting Pipelines is important. For this, it would be ideal >>>>>>>>>> to utilize elements of Spark SQL, particularly Rows and Types, which >>>>>>>>>> could >>>>>>>>>> be moved into a local sql package. >>>>>>>>>> * This architecture may require some awkward APIs currently to >>>>>>>>>> have model prediction logic in mllib-local, local model classes in >>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in >>>>>>>>>> mllib. We >>>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0 to >>>>>>>>>> facilitate this architecture while making it feasible for 3rd party >>>>>>>>>> developers to extend MLlib APIs (especially in Java). >>>>>>>>>> >>>>>>>>> I agree this could be interesting, and feed into the other >>>>>>>>> discussion around when (or if) we should be considering Spark 3.0 >>>>>>>>> I _think_ we could probably do it with optional traits people >>>>>>>>> could mix in to avoid breaking the current APIs but I could be wrong >>>>>>>>> on >>>>>>>>> that point. >>>>>>>>> >>>>>>>>>> * It could also be worth discussing local DataFrames. They might >>>>>>>>>> not be as important as per-Row transformations, but they would be >>>>>>>>>> helpful >>>>>>>>>> for batching for higher throughput. >>>>>>>>>> >>>>>>>>> That could be interesting as well. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'll be interested to hear others' thoughts too! >>>>>>>>>> >>>>>>>>>> Joseph >>>>>>>>>> >>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau < >>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>> >>>>>>>>>>> Hi y'all, >>>>>>>>>>> >>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like a >>>>>>>>>>> good a time as any to revisit the online serving situation in Spark >>>>>>>>>>> ML. DB >>>>>>>>>>> & other's have done some excellent working moving a lot of the >>>>>>>>>>> necessary >>>>>>>>>>> tools into a local linear algebra package that doesn't depend on >>>>>>>>>>> having a >>>>>>>>>>> SparkContext. >>>>>>>>>>> >>>>>>>>>>> There are a few different commercial and non-commercial >>>>>>>>>>> solutions round this, but currently our individual transform/predict >>>>>>>>>>> methods are private so they either need to copy or re-implement (or >>>>>>>>>>> put >>>>>>>>>>> them selves in org.apache.spark) to access them. How would folks >>>>>>>>>>> feel about >>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do >>>>>>>>>>> transformation of >>>>>>>>>>> single element inputs (or local collections) that could be >>>>>>>>>>> optionally >>>>>>>>>>> implemented by stages which support this? That way we can have less >>>>>>>>>>> copy >>>>>>>>>>> and paste code possibly getting out of sync with our model training. >>>>>>>>>>> >>>>>>>>>>> I think continuing to have on-line serving grow in different >>>>>>>>>>> projects is probably the right path, forward (folks have different >>>>>>>>>>> needs), >>>>>>>>>>> but I'd love to see us make it simpler for other projects to build >>>>>>>>>>> reliable >>>>>>>>>>> serving tools. >>>>>>>>>>> >>>>>>>>>>> I realize this maybe puts some of the folks in an awkward >>>>>>>>>>> position with their own commercial offerings, but hopefully if we >>>>>>>>>>> make it >>>>>>>>>>> easier for everyone the commercial vendors can benefit as well. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Holden :) >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Joseph Bradley >>>>>>>>>> Software Engineer - Machine Learning >>>>>>>>>> Databricks, Inc. >>>>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Joseph Bradley >>>>>>>> Software Engineer - Machine Learning >>>>>>>> Databricks, Inc. >>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Cheers, >>>>>>> Leif >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> >