Why not add a flag to the SDK that would do the phone home when specified? >From a support perspective it would be useful to know: * SDK version * Runner * SDK provided PTransforms that are used * Features like user state/timers/side inputs/splittable dofns/... * Graph complexity (# nodes, # branches, ...) * Pipeline failed or succeeded
On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com> wrote: > On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote: > > > > Would people actually click on that link though? I think Kyle has a > point that in practice users would only find and click on that link when > they're having some kind of issue, especially if the link has "feedback" in > it. > > I think the idea is that we would make the link very light-weight, > kind of like a survey (but even easier as it's pre-populated). > Basically an opt-in phone-home. If we don't collect any personal data > (not even IP/geo, just (say) version + runner, all visible in the > URL), no need to guard/anonymize (and this may be sufficient--I don't > think we have to worry about spammers and ballot stuffers given the > target audience). If we can catch people while they wait for their > pipeline to start up (and/or complete), this is a great time to get > some feedback. > > > I agree usage data would be really valuable, but I'm not sure that this > approach would get us good data. Is there a way to get download statistics > for the different runner artifacts? Maybe that could be a better metric to > compare usage. > > This'd be useful too, but hard to get and very noisy. > > > > > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote: > >> > >> I agree, these are the questions that need to be answered. > >> The data can be anonymize and stored as public data in BigQuery or some > other place. > >> > >> The intent is to get the usage statistics so that we can get to know > what people are using Flink or Spark etc and not intended for discussion or > a help channel. > >> I also think that we don't need to monitor this actively as it's more > like a survey rather than active channel to get issues resolved. > >> > >> If we think its useful for the community then we come up with the > solution as to how can we do this (similar to how we released the container > images). > >> > >> > >> > >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> > wrote: > >>> > >>> There are some logistics that would need worked out. For example, > Where would the data go? Who would own it? > >>> > >>> Also, I'm not convinced we need yet another place to discuss Beam when > we already have discussed the challenge of simultaneously monitoring > mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" is > certainly an interesting question, and I'd be curious to know that >= X > many people use a certain runner, I'm not sure answers to these questions > are as useful for guiding the future of Beam as discussions on the > dev/users lists, etc. as the latter likely result in more depth/specific > feedback. > >>> > >>> However, I do think it could be useful in general to include links > directly in the console output. For example, maybe something along the > lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the > mailing list." > >>> > >>> Kyle Weaver | Software Engineer | github.com/ibzib | > kcwea...@google.com > >>> > >>> > >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> > wrote: > >>>> > >>>> Hi, > >>>> > >>>> At the moment we don't really have a good way to collect any usage > statistics for Apache Beam. Like runner used etc. As many of the users > don't really have a way to report their usecase. > >>>> How about if we create a feedback page where users can add their > pipeline details and usecase. > >>>> Also, we can start printing the link to this page when user launch > the pipeline in the command line. > >>>> Example: > >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc > >>>> > >>>> Starting pipeline > >>>> Please use > http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc > >>>> Pipeline started > >>>> ...... > >>>> > >>>> Using a link and not publishing the data automatically will give user > control over what they publish and what they don't. We can enhance the text > and usage further but the basic idea is to ask for user feeback at each run > of the pipeline. > >>>> Let me know what you think. > >>>> > >>>> > >>>> Thanks, > >>>> Ankur >