Frank Yellin created BEAM-536:
---------------------------------
Summary: Aggregator.py. More misleading documentation. More bad
documentation
Key: BEAM-536
URL: https://issues.apache.org/jira/browse/BEAM-536
Project: Beam
Issue Type: Bug
Reporter: Frank Yellin
Priority: Minor
The last paragraph of the documentation for Aggregator is:
You can also query the combined value(s) of an aggregator by calling
aggregated_value() or aggregated_values() on the result object returned after
running a pipeline.
There are multiple problems in this one sentence!
#1) There is no such method aggregated_value() that I can find anywhere.
#2) DirectRunner implements aggregated_values(), but DirectPipelineRunner does
not. The latter is the far more interesting case.
#3) When I use a BlockingDirectPipelineRunner and ask for its
aggregated_values(), I get an error message indicating that this is not
implemented in DirectPipelineRunner. Very confusing since I never asked for a
DirectPipelineRunner.
It is clear that this is because BlockingDirectPipelineRunner is a method
rather than a class. Is this really the right thing? Will there be other
confusing error messages.
#4) The documentation for aggregated_values() says "returns a dict of step
names to values of the aggregator." I have no idea what a "step" means in this
context. In practice, it seems to be a single-element dictionary whose key is
'user--' prefixed onto the aggregator name.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)