I think the goal is to lower the barrier of entry. Displaying a URL to
click on while waiting for your pipeline to start up, that contains
all the data explicitly visible, is about as easy as it gets.
Remembering to run a new (probably not as authentic) pipeline with
that flag is less so.

On Tue, Sep 24, 2019 at 11:04 AM Mikhail Gryzykhin <mig...@google.com> wrote:
>
> I'm with Luke on this. We can add a set of flags to send home stats and crash 
> dumps if user agrees. If we keep code isolated, it will be easy enough for 
> user to check what is being sent.
>
> One more heavy-weight option is to also allow user configure and persist what 
> information he is ok with sharing.
>
> --Mikhail
>
>
> On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>> Why not add a flag to the SDK that would do the phone home when specified?
>>
>> From a support perspective it would be useful to know:
>> * SDK version
>> * Runner
>> * SDK provided PTransforms that are used
>> * Features like user state/timers/side inputs/splittable dofns/...
>> * Graph complexity (# nodes, # branches, ...)
>> * Pipeline failed or succeeded
>>
>> On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw <rober...@google.com> wrote:
>>>
>>> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette <bhule...@google.com> wrote:
>>> >
>>> > Would people actually click on that link though? I think Kyle has a point 
>>> > that in practice users would only find and click on that link when 
>>> > they're having some kind of issue, especially if the link has "feedback" 
>>> > in it.
>>>
>>> I think the idea is that we would make the link very light-weight,
>>> kind of like a survey (but even easier as it's pre-populated).
>>> Basically an opt-in phone-home. If we don't collect any personal data
>>> (not even IP/geo, just (say) version + runner, all visible in the
>>> URL), no need to guard/anonymize (and this may be sufficient--I don't
>>> think we have to worry about spammers and ballot stuffers given the
>>> target audience). If we can catch people while they wait for their
>>> pipeline to start up (and/or complete), this is a great time to get
>>> some feedback.
>>>
>>> > I agree usage data would be really valuable, but I'm not sure that this 
>>> > approach would get us good data. Is there a way to get download 
>>> > statistics for the different runner artifacts? Maybe that could be a 
>>> > better metric to compare usage.
>>>
>>> This'd be useful too, but hard to get and very noisy.
>>>
>>> >
>>> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka <goe...@google.com> wrote:
>>> >>
>>> >> I agree, these are the questions that need to be answered.
>>> >> The data can be anonymize and stored as public data in BigQuery or some 
>>> >> other place.
>>> >>
>>> >> The intent is to get the usage statistics so that we can get to know 
>>> >> what people are using Flink or Spark etc and not intended for discussion 
>>> >> or a help channel.
>>> >> I also think that we don't need to monitor this actively as it's more 
>>> >> like a survey rather than active channel to get issues resolved.
>>> >>
>>> >> If we think its useful for the community then we come up with the 
>>> >> solution as to how can we do this (similar to how we released the 
>>> >> container images).
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver <kcwea...@google.com> wrote:
>>> >>>
>>> >>> There are some logistics that would need worked out. For example, Where 
>>> >>> would the data go? Who would own it?
>>> >>>
>>> >>> Also, I'm not convinced we need yet another place to discuss Beam when 
>>> >>> we already have discussed the challenge of simultaneously monitoring 
>>> >>> mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" 
>>> >>> is certainly an interesting question, and I'd be curious to know that 
>>> >>> >= X many people use a certain runner, I'm not sure answers to these 
>>> >>> questions are as useful for guiding the future of Beam as discussions 
>>> >>> on the dev/users lists, etc. as the latter likely result in more 
>>> >>> depth/specific feedback.
>>> >>>
>>> >>> However, I do think it could be useful in general to include links 
>>> >>> directly in the console output. For example, maybe something along the 
>>> >>> lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask 
>>> >>> the mailing list."
>>> >>>
>>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>>> >>>
>>> >>>
>>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka <goe...@google.com> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> At the moment we don't really have a good way to collect any usage 
>>> >>>> statistics for Apache Beam. Like runner used etc. As many of the users 
>>> >>>> don't really have a way to report their usecase.
>>> >>>> How about if we create a feedback page where users can add their 
>>> >>>> pipeline details and usecase.
>>> >>>> Also, we can start printing the link to this page when user launch the 
>>> >>>> pipeline in the command line.
>>> >>>> Example:
>>> >>>> $ python my_pipeline.py --runner DirectRunner --input /tmp/abc
>>> >>>>
>>> >>>> Starting pipeline
>>> >>>> Please use 
>>> >>>> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc
>>> >>>> Pipeline started
>>> >>>> ......
>>> >>>>
>>> >>>> Using a link and not publishing the data automatically will give user 
>>> >>>> control over what they publish and what they don't. We can enhance the 
>>> >>>> text and usage further but the basic idea is to ask for user feeback 
>>> >>>> at each run of the pipeline.
>>> >>>> Let me know what you think.
>>> >>>>
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Ankur

Reply via email to