Re: [discovery] [Research-Internal] [Analytics] Article about ML in production woes

Nuria Ruiz Thu, 07 Feb 2019 11:06:12 -0800

Team,

Since everyone is here, we will be working on a machine learning
infrastructure program this year. I will set up meetings with everyone on
this thread and some others in SRE and Audiences to get a "bag of requests"
of things that are missing, first round of talks that I hope to finish next
week is to hear what everyone requests/ideas are.  Will be sending meeting
invites today and tomorrow.  I think from those some themes will emerge.
Thus far,  it is pretty clear we need a better way to deploy models to
production (right now we deploy those to elastic search in very crafty
manners, for example) , we need to have an answer to GPU issues to train
models, we need to have a "recommended way" in which we train and compute,
some unified system for tracking models+data+tests and finally, there are
probably many learnings the work been done in Ores thus far.


Thanks,

Nuria


On Thu, Feb 7, 2019 at 8:40 AM Miriam Redi <[email protected]> wrote:

> Hey Andrew!
>
> Thank you so much for sharing this and start this conversation. We had a
> meeting at All Hands with all people interested in "Image Classification"
> https://phabricator.wikimedia.org/T215413 , and one of the open questions
> was exactly how to find a "common repository" for ML models that different
> groups and products within the organization can use. So, please, count me
> in!
>
> Thanks,
>
> M
>
>
> On Thu, Feb 7, 2019 at 4:38 PM Aaron Halfaker <[email protected]>
> wrote:
>
>> Just gave the article a quick read.  I think this article pushes on some
>> key issues for sure.  I definitely agree with the focus on python/jupyter
>> as essential for a productive workflow that leverages the best from
>> research scientists.  We've been thinking about what ORES 2.0 would look
>> like and event streams are the dominant proposal for improving on the
>> limitations of our queue-based worker pool.
>>
>> One of the nice things about ORES/revscoring is that it provides a nice
>> framework for operating using the *exact same code* no matter the
>> environment.  E.g. it doesn't matter if we're calling out to an API to get
>> data for feature extraction or providing it via a stream.  By investing in
>> a dependency injection strategy, we get that flexibility.  So to me, the
>> hardest problem -- the one I don't quite know how to solve -- is how we'll
>> mix and merge streams to get all of the data we want available for feature
>> extraction.  If I understand correctly, that's where Kafka shines.  :)
>>
>> I'm definitely interested in fleshing out this proposal.  We should
>> probably be exploring the processes for training new types of models (e.g.
>> image processing) using different strategies than ORES.  In ORES, we're
>> almost entirely focused on using sklearn but we have some basic
>> abstractions for other estimator libraries.  We also make some strong
>> assumptions about running on a single CPU that could probably be broken for
>> some performance gains using real concurrency.
>>
>> -Aaron
>>
>> On Thu, Feb 7, 2019 at 10:05 AM Goran Milovanovic <
>> [email protected]> wrote:
>>
>>> Hi Andrew,
>>>
>>> I have recently started a six month AI/Machine Learning Engineering
>>> course which focuses exactly on the topics that you've shown interest in.
>>>
>>> So,
>>>
>>> >>>  I'd love it if we had a working group (or whatever) that focused
>>> on how to standardize how we train and deploy ML for production use.
>>>
>>> Count me in.
>>>
>>> Regards,
>>> Goran
>>>
>>>
>>> Goran S. Milovanović, PhD
>>> Data Scientist, Software Department
>>> Wikimedia Deutschland
>>>
>>> ------------------------------------------------
>>> "It's not the size of the dog in the fight,
>>> it's the size of the fight in the dog."
>>> - Mark Twain
>>> ------------------------------------------------
>>>
>>>
>>> On Thu, Feb 7, 2019 at 4:16 PM Andrew Otto <[email protected]> wrote:
>>>
>>>> Just came across
>>>>
>>>> https://www.confluent.io/blog/machine-learning-with-python-jupyter-ksql-tensorflow
>>>>
>>>> In it, the author discusses some of what he calls the 'impedance
>>>> mismatch' between data engineers and production engineers.  The links to
>>>> Ubers Michelangelo <https://eng.uber.com/michelangelo/> (which as far
>>>> as I can tell has not been open sourced) and the Hidden Technical Debt
>>>> in Machine Learning Systems paper
>>>> <https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf>
>>>>  are
>>>> also very interesting!
>>>>
>>>> At All hands I've been hearing more and more about using ML in
>>>> production, so these things seem very relevant to us.  I'd love it if we
>>>> had a working group (or whatever) that focused on how to standardize how we
>>>> train and deploy ML for production use.
>>>>
>>>> :)
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>
>> --
>>
>> Aaron Halfaker
>>
>> Principal Research Scientist
>>
>> Head of the Scoring Platform team
>> Wikimedia Foundation
>> _______________________________________________
>> Research-Internal mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>
> _______________________________________________
> Research-Internal mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/research-internal
>

_______________________________________________
Discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Re: [discovery] [Research-Internal] [Analytics] Article about ML in production woes

Reply via email to