One of the options could be to just display the URL and not to phone home. I would like it so that users can integrate this into their deployment solution so we get regular stats instead of only when a user decides to run a pipeline manually.
On Tue, Sep 24, 2019 at 11:13 AM Robert Bradshaw <rober...@google.com> wrote: > I think the goal is to lower the barrier of entry. Displaying a URL to > click on while waiting for your pipeline to start up, that contains > all the data explicitly visible, is about as easy as it gets. > Remembering to run a new (probably not as authentic) pipeline with > that flag is less so. > > On Tue, Sep 24, 2019 at 11:04 AM Mikhail Gryzykhin <mig...@google.com> > wrote: > > > > I'm with Luke on this. We can add a set of flags to send home stats and > crash dumps if user agrees. If we keep code isolated, it will be easy > enough for user to check what is being sent. > > > > One more heavy-weight option is to also allow user configure and persist > what information he is ok with sharing. > > > > --Mikhail > > > > > > On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik <lc...@google.com> wrote: > >> > >> Why not add a flag to the SDK that would do the phone home when > specified? > >> > >> From a support perspective it would be useful to know: > >> * SDK version > >> * Runner > >> * SDK provided PTransforms that are used > >> * Features like user state/timers/side inputs/splittable dofns/... > >> * Graph complexity (# nodes, # branches, ...) > >> * Pipeline failed or succeeded > >> > >> On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com> > wrote: > >>> > >>> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> > wrote: > >>> > > >>> > Would people actually click on that link though? I think Kyle has a > point that in practice users would only find and click on that link when > they're having some kind of issue, especially if the link has "feedback" in > it. > >>> > >>> I think the idea is that we would make the link very light-weight, > >>> kind of like a survey (but even easier as it's pre-populated). > >>> Basically an opt-in phone-home. If we don't collect any personal data > >>> (not even IP/geo, just (say) version + runner, all visible in the > >>> URL), no need to guard/anonymize (and this may be sufficient--I don't > >>> think we have to worry about spammers and ballot stuffers given the > >>> target audience). If we can catch people while they wait for their > >>> pipeline to start up (and/or complete), this is a great time to get > >>> some feedback. > >>> > >>> > I agree usage data would be really valuable, but I'm not sure that > this approach would get us good data. Is there a way to get download > statistics for the different runner artifacts? Maybe that could be a better > metric to compare usage. > >>> > >>> This'd be useful too, but hard to get and very noisy. > >>> > >>> > > >>> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> > wrote: > >>> >> > >>> >> I agree, these are the questions that need to be answered. > >>> >> The data can be anonymize and stored as public data in BigQuery or > some other place. > >>> >> > >>> >> The intent is to get the usage statistics so that we can get to > know what people are using Flink or Spark etc and not intended for > discussion or a help channel. > >>> >> I also think that we don't need to monitor this actively as it's > more like a survey rather than active channel to get issues resolved. > >>> >> > >>> >> If we think its useful for the community then we come up with the > solution as to how can we do this (similar to how we released the container > images). > >>> >> > >>> >> > >>> >> > >>> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> > wrote: > >>> >>> > >>> >>> There are some logistics that would need worked out. For example, > Where would the data go? Who would own it? > >>> >>> > >>> >>> Also, I'm not convinced we need yet another place to discuss Beam > when we already have discussed the challenge of simultaneously monitoring > mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" is > certainly an interesting question, and I'd be curious to know that >= X > many people use a certain runner, I'm not sure answers to these questions > are as useful for guiding the future of Beam as discussions on the > dev/users lists, etc. as the latter likely result in more depth/specific > feedback. > >>> >>> > >>> >>> However, I do think it could be useful in general to include links > directly in the console output. For example, maybe something along the > lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the > mailing list." > >>> >>> > >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | > kcwea...@google.com > >>> >>> > >>> >>> > >>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> > wrote: > >>> >>>> > >>> >>>> Hi, > >>> >>>> > >>> >>>> At the moment we don't really have a good way to collect any > usage statistics for Apache Beam. Like runner used etc. As many of the > users don't really have a way to report their usecase. > >>> >>>> How about if we create a feedback page where users can add their > pipeline details and usecase. > >>> >>>> Also, we can start printing the link to this page when user > launch the pipeline in the command line. > >>> >>>> Example: > >>> >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc > >>> >>>> > >>> >>>> Starting pipeline > >>> >>>> Please use > http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc > >>> >>>> Pipeline started > >>> >>>> ...... > >>> >>>> > >>> >>>> Using a link and not publishing the data automatically will give > user control over what they publish and what they don't. We can enhance the > text and usage further but the basic idea is to ask for user feeback at > each run of the pipeline. > >>> >>>> Let me know what you think. > >>> >>>> > >>> >>>> > >>> >>>> Thanks, > >>> >>>> Ankur >