One of the options could be to just display the URL and not to phone home.
I would like it so that users can integrate this into their deployment
solution so we get regular stats instead of only when a user decides to run
a pipeline manually.

On Tue, Sep 24, 2019 at 11:13 AM Robert Bradshaw <rober...@google.com>
wrote:

> I think the goal is to lower the barrier of entry. Displaying a URL to
> click on while waiting for your pipeline to start up, that contains
> all the data explicitly visible, is about as easy as it gets.
> Remembering to run a new (probably not as authentic) pipeline with
> that flag is less so.
>
> On Tue, Sep 24, 2019 at 11:04 AM Mikhail Gryzykhin <mig...@google.com>
> wrote:
> >
> > I'm with Luke on this. We can add a set of flags to send home stats and
> crash dumps if user agrees. If we keep code isolated, it will be easy
> enough for user to check what is being sent.
> >
> > One more heavy-weight option is to also allow user configure and persist
> what information he is ok with sharing.
> >
> > --Mikhail
> >
> >
> > On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik <lc...@google.com> wrote:
> >>
> >> Why not add a flag to the SDK that would do the phone home when
> specified?
> >>
> >> From a support perspective it would be useful to know:
> >> * SDK version
> >> * Runner
> >> * SDK provided PTransforms that are used
> >> * Features like user state/timers/side inputs/splittable dofns/...
> >> * Graph complexity (# nodes, # branches, ...)
> >> * Pipeline failed or succeeded
> >>
> >> On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com>
> wrote:
> >>>
> >>> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com>
> wrote:
> >>> >
> >>> > Would people actually click on that link though? I think Kyle has a
> point that in practice users would only find and click on that link when
> they're having some kind of issue, especially if the link has "feedback" in
> it.
> >>>
> >>> I think the idea is that we would make the link very light-weight,
> >>> kind of like a survey (but even easier as it's pre-populated).
> >>> Basically an opt-in phone-home. If we don't collect any personal data
> >>> (not even IP/geo, just (say) version + runner, all visible in the
> >>> URL), no need to guard/anonymize (and this may be sufficient--I don't
> >>> think we have to worry about spammers and ballot stuffers given the
> >>> target audience). If we can catch people while they wait for their
> >>> pipeline to start up (and/or complete), this is a great time to get
> >>> some feedback.
> >>>
> >>> > I agree usage data would be really valuable, but I'm not sure that
> this approach would get us good data. Is there a way to get download
> statistics for the different runner artifacts? Maybe that could be a better
> metric to compare usage.
> >>>
> >>> This'd be useful too, but hard to get and very noisy.
> >>>
> >>> >
> >>> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com>
> wrote:
> >>> >>
> >>> >> I agree, these are the questions that need to be answered.
> >>> >> The data can be anonymize and stored as public data in BigQuery or
> some other place.
> >>> >>
> >>> >> The intent is to get the usage statistics so that we can get to
> know what people are using Flink or Spark etc and not intended for
> discussion or a help channel.
> >>> >> I also think that we don't need to monitor this actively as it's
> more like a survey rather than active channel to get issues resolved.
> >>> >>
> >>> >> If we think its useful for the community then we come up with the
> solution as to how can we do this (similar to how we released the container
> images).
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com>
> wrote:
> >>> >>>
> >>> >>> There are some logistics that would need worked out. For example,
> Where would the data go? Who would own it?
> >>> >>>
> >>> >>> Also, I'm not convinced we need yet another place to discuss Beam
> when we already have discussed the challenge of simultaneously monitoring
> mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" is
> certainly an interesting question, and I'd be curious to know that >= X
> many people use a certain runner, I'm not sure answers to these questions
> are as useful for guiding the future of Beam as discussions on the
> dev/users lists, etc. as the latter likely result in more depth/specific
> feedback.
> >>> >>>
> >>> >>> However, I do think it could be useful in general to include links
> directly in the console output. For example, maybe something along the
> lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the
> mailing list."
> >>> >>>
> >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib |
> kcwea...@google.com
> >>> >>>
> >>> >>>
> >>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com>
> wrote:
> >>> >>>>
> >>> >>>> Hi,
> >>> >>>>
> >>> >>>> At the moment we don't really have a good way to collect any
> usage statistics for Apache Beam. Like runner used etc. As many of the
> users don't really have a way to report their usecase.
> >>> >>>> How about if we create a feedback page where users can add their
> pipeline details and usecase.
> >>> >>>> Also, we can start printing the link to this page when user
> launch the pipeline in the command line.
> >>> >>>> Example:
> >>> >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc
> >>> >>>>
> >>> >>>> Starting pipeline
> >>> >>>> Please use
> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc
> >>> >>>> Pipeline started
> >>> >>>> ......
> >>> >>>>
> >>> >>>> Using a link and not publishing the data automatically will give
> user control over what they publish and what they don't. We can enhance the
> text and usage further but the basic idea is to ask for user feeback at
> each run of the pipeline.
> >>> >>>> Let me know what you think.
> >>> >>>>
> >>> >>>>
> >>> >>>> Thanks,
> >>> >>>> Ankur
>

Reply via email to