[
https://issues.apache.org/jira/browse/BEAM-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410353#comment-15410353
]
Ahmet Altay commented on BEAM-536:
----------------------------------
#1 - The comment should be cleaned.
#2 - Tracking issue: https://issues.apache.org/jira/browse/BEAM-531
#3 - BlockingDataflowPipelineRunner is being removed for Java
(https://github.com/apache/incubator-beam/pull/762) . It is being replaced with
an optional set of wait...() methods on the result. We should do the same thing
in the Python SDK.
__str__ and __repr__ methods of DataflowPipelineRunner also use class name
(https://github.com/aaltay/incubator-beam/blob/python-sdk/sdks/python/apache_beam/runners/dataflow_runner.py#L651).
So printing the BlockingDataflowPipelineRunner object will use the wrong name.
#4 - This also needs doc improvements. (related javadoc
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/AggregatorPipelineExtractor.html#getAggregatorSteps--)
User counters are by default prefixed with user-, there might be non user-
aggregators once DataflowPipelineRunner implements this.
I believe #1 and #4 can be tracked here for documentation changes. #3 requires
a new bug of its own.
> Aggregator.py. More misleading documentation. More bad documentation
> ----------------------------------------------------------------------
>
> Key: BEAM-536
> URL: https://issues.apache.org/jira/browse/BEAM-536
> Project: Beam
> Issue Type: Bug
> Reporter: Frank Yellin
> Priority: Minor
>
> The last paragraph of the documentation for Aggregator is:
> You can also query the combined value(s) of an aggregator by calling
> aggregated_value() or aggregated_values() on the result object returned after
> running a pipeline.
> There are multiple problems in this one sentence!
> #1) There is no such method aggregated_value() that I can find anywhere.
> #2) DirectRunner implements aggregated_values(), but DirectPipelineRunner
> does not. The latter is the far more interesting case.
> #3) When I use a BlockingDirectPipelineRunner and ask for its
> aggregated_values(), I get an error message indicating that this is not
> implemented in DirectPipelineRunner. Very confusing since I never asked for
> a DirectPipelineRunner.
> It is clear that this is because BlockingDirectPipelineRunner is a method
> rather than a class. Is this really the right thing? Will there be other
> confusing error messages.
> #4) The documentation for aggregated_values() says "returns a dict of step
> names to values of the aggregator." I have no idea what a "step" means in
> this context. In practice, it seems to be a single-element dictionary whose
> key is 'user--' prefixed onto the aggregator name.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)