Hi Kam & Soila, Thanks a lot for writing this up. I ran the doc past some of the folks who've been doing ML work here at Google, and they were generally happy with the distillation of common methods in the doc. I'd be curious to hear what folks on the Flink- and Spark- runner sides think.
To me, this seems like a good direction for a high-level API. Presumably, once a high-level API is in place, we could begin looking at what it would take to add lower-level ML algorithm support (e.g. iterative) to the Beam Model. Is this essentially what you're thinking? Some more specific questions/comments: - Presumably you'd want to tackle this in Java first, since that's the only language we currently support? Given that half of your examples are in Python, I'm also assuming Python will be interesting once it's available. - Along those lines, what languages are represented in the capability matrix? E.g. is Spark ML support as detailed there identical across Java/Scala and Python? - Have you thought about how this would tie in at the runner level, particularly given the updated Runner API changes that are coming? I'm assuming they'd be provided as composite transforms that (for now) would have no default implementation, given the lack of low-level primitives for ML algorithms, but am curious what your thoughts are there. - I still don't fully understand how incremental updates due to model drift would tie in at the API level. There's a comment thread in the doc still open tracking this, so no need to comment here additionally. Just pointing it out as one of the things that stands out as potentially having API-level impacts to me that doesn't seem 100% fleshed out in the doc yet (thought that admittedly may just be my limited understanding at this point :-). -Tyler On Fri, May 13, 2016 at 10:48 AM Kam Kasravi <[email protected]> wrote: > Hi Tyler - my bad. Comments should be enabled now. > > On Fri, May 13, 2016 at 10:45 AM, Tyler Akidau <[email protected] > > > wrote: > > > Thanks a lot, Kam. Can you please enable comment access on the doc? I > seem > > to have view access only. > > > > -Tyler > > > > On Fri, May 13, 2016 at 9:54 AM Kam Kasravi <[email protected]> > wrote: > > > > > Hi > > > > > > A number of readers have made comments on this topic recently. We have > > > created a document that does some analysis of common ML models and > > related > > > APIs. We hope this can drive an approach that will result in an API, > > > compatibility matrix and involvement from the same groups that are > > > implementing transformation runners (spark, flink, etc). We welcome > > > comments here or in the document itself. > > > > > > > > > > > > https://docs.google.com/document/d/17cRZk_yqHm3C0fljivjN66MbLkeKS1yjo4PBECHb-xA/edit?usp=sharing > > > > > >
