On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw <rober...@google.com> wrote:
> Hi Holden! > > I agree with Kyle that it makes sense to have some caveat about Flink and > Spark, though at this point they're not /that/ new (at least not Flink). > True, maybe "early-stage" would be better wording? The TFX PyBeam Flink support isn't yet mature enough (although there is interest in integrating it in Kubeflow I believe, it hasn't happened yet). > > I am curious what extra support Kubeflow is "missing" (or, conversely, > what extra support it has for Dataflow that goes beyond just specifying a > different runner) to the point that these runners are declared > "unsupported." Or it it literally a matter of not providing user support? > So the Kubeflow TFX components (in https://github.com/kubeflow/pipelines/tree/master/components) are limited to local mode. > > On Fri, Apr 17, 2020 at 12:27 PM Kyle Weaver <kcwea...@google.com> wrote: > >> Hi Holden, >> >> The note on Flink & Spark support sounds reasonable to me. I am >> optimistic about getting Flink + TFX + Kubeflow working fairly soon, but I >> agree that we don't want to over-promise. >> >> I'm not so sure about the status of Dataflow here, perhaps someone else >> can comment on that. >> >> Looking forward to the book :) >> >> Kyle >> >> On Fri, Apr 17, 2020 at 1:14 PM Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> Hi Apache Beam Developers, >>> >>> I'm working on a book about Kubeflow, which naturally has a section on >>> TFX. I want to set users expectations correctly so I wanted to know what >>> y'all thought of this NOTE we were thinking of including in the early >>> release: >>> >>> Apache Beam’s Python support outside of Google cloud's Dataflow is >>> relatively new. TFX is a Python tool, so scaling it depends on Apache >>> Beam's Python support. You can scale your job by using the non-portable >>> dataflow component, but this requires changing your pipeline code and isn't >>> supported by Kubeflow's current TFX components. As Apache Beam's support >>> for Apache Flink & Spark improves support may be added for scaling the TFX >>> components in a portable manner. >>> >>> Does this sound reasonable to folks? I don't want to over-promise but I >>> also don't want to scare people away given all of the progress that is >>> being made in supporting the open-source runners with language portability. >>> >>> Cheers, >>> >>> Holden :) >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau