Why not add a flag to the SDK that would do the phone home when specified?

>From a support perspective it would be useful to know:
* SDK version
* Runner
* SDK provided PTransforms that are used
* Features like user state/timers/side inputs/splittable dofns/...
* Graph complexity (# nodes, # branches, ...)
* Pipeline failed or succeeded

On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com> wrote:

> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote:
> >
> > Would people actually click on that link though? I think Kyle has a
> point that in practice users would only find and click on that link when
> they're having some kind of issue, especially if the link has "feedback" in
> it.
>
> I think the idea is that we would make the link very light-weight,
> kind of like a survey (but even easier as it's pre-populated).
> Basically an opt-in phone-home. If we don't collect any personal data
> (not even IP/geo, just (say) version + runner, all visible in the
> URL), no need to guard/anonymize (and this may be sufficient--I don't
> think we have to worry about spammers and ballot stuffers given the
> target audience). If we can catch people while they wait for their
> pipeline to start up (and/or complete), this is a great time to get
> some feedback.
>
> > I agree usage data would be really valuable, but I'm not sure that this
> approach would get us good data. Is there a way to get download statistics
> for the different runner artifacts? Maybe that could be a better metric to
> compare usage.
>
> This'd be useful too, but hard to get and very noisy.
>
> >
> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote:
> >>
> >> I agree, these are the questions that need to be answered.
> >> The data can be anonymize and stored as public data in BigQuery or some
> other place.
> >>
> >> The intent is to get the usage statistics so that we can get to know
> what people are using Flink or Spark etc and not intended for discussion or
> a help channel.
> >> I also think that we don't need to monitor this actively as it's more
> like a survey rather than active channel to get issues resolved.
> >>
> >> If we think its useful for the community then we come up with the
> solution as to how can we do this (similar to how we released the container
> images).
> >>
> >>
> >>
> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com>
> wrote:
> >>>
> >>> There are some logistics that would need worked out. For example,
> Where would the data go? Who would own it?
> >>>
> >>> Also, I'm not convinced we need yet another place to discuss Beam when
> we already have discussed the challenge of simultaneously monitoring
> mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" is
> certainly an interesting question, and I'd be curious to know that >= X
> many people use a certain runner, I'm not sure answers to these questions
> are as useful for guiding the future of Beam as discussions on the
> dev/users lists, etc. as the latter likely result in more depth/specific
> feedback.
> >>>
> >>> However, I do think it could be useful in general to include links
> directly in the console output. For example, maybe something along the
> lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the
> mailing list."
> >>>
> >>> Kyle Weaver | Software Engineer | github.com/ibzib |
> kcwea...@google.com
> >>>
> >>>
> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com>
> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> At the moment we don't really have a good way to collect any usage
> statistics for Apache Beam. Like runner used etc. As many of the users
> don't really have a way to report their usecase.
> >>>> How about if we create a feedback page where users can add their
> pipeline details and usecase.
> >>>> Also, we can start printing the link to this page when user launch
> the pipeline in the command line.
> >>>> Example:
> >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc
> >>>>
> >>>> Starting pipeline
> >>>> Please use
> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc
> >>>> Pipeline started
> >>>> ......
> >>>>
> >>>> Using a link and not publishing the data automatically will give user
> control over what they publish and what they don't. We can enhance the text
> and usage further but the basic idea is to ask for user feeback at each run
> of the pipeline.
> >>>> Let me know what you think.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Ankur
>

Reply via email to