[
https://issues.apache.org/jira/browse/BEAM-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122859#comment-17122859
]
Beam JIRA Bot commented on BEAM-7026:
-------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> Python SDK: Unable to obtain the PCollection for output tags which are not
> consumed by a downstream step.
> ---------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7026
> URL: https://issues.apache.org/jira/browse/BEAM-7026
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-harness
> Reporter: Alex Amato
> Priority: P2
> Labels: stale-P2
>
> I noticed that we are not able to convert the output tag+transform to the
> pcollection name for metrics (element count/mean byte count), if the
> Pcollections for the outputed tags are not consumed by a downstream step.
> This isn't critical as (1) Arguably there is no pcollection at all. (2)
> Output but not consumed PCollections are not critical to count metrics on as
> those can be optomized away entirely (No need to do any work, collect
> metrics, etc. for an unconsumed pcollection).
> However, we are able to count this, but we are unable to assign a pcollection
> name for it, as in this case there is no information about that output tag
> defined in the bundle descriptor. The alternative fix is to make sure that
> its always available, even if not consumed.
> Pablo and I looked into this a bit, and he believed it would be possible in
> pvalue.py'sĀ
> DoOutputsTuple class. This fix would require callingĀ __getitem__ on all tags
> to initialize them properly. However, I had some trouble doing this, as this
> class is a bit strange since it overrides __getattr__. I found weird
> behaviors when adding functionality to this code. I don't really get how the
> code functions today, as its own instance variable usage should trigger the
> custom __getattr__ code, yet we seem to be using these attrs normally with
> self.X usages.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)