[
https://issues.apache.org/jira/browse/BEAM-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
niklas Hansson updated BEAM-7026:
---------------------------------
Comment: was deleted
(was: I would be happy to start look at this :))
> Python SDK: Unable to obtain the PCollection for output tags which are not
> consumed by a downstream step.
> ---------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7026
> URL: https://issues.apache.org/jira/browse/BEAM-7026
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-harness
> Reporter: Alex Amato
> Priority: Major
>
> I noticed that we are not able to convert the output tag+transform to the
> pcollection name for metrics (element count/mean byte count), if the
> Pcollections for the outputed tags are not consumed by a downstream step.
> This isn't critical as (1) Arguably there is no pcollection at all. (2)
> Output but not consumed PCollections are not critical to count metrics on as
> those can be optomized away entirely (No need to do any work, collect
> metrics, etc. for an unconsumed pcollection).
> However, we are able to count this, but we are unable to assign a pcollection
> name for it, as in this case there is no information about that output tag
> defined in the bundle descriptor. The alternative fix is to make sure that
> its always available, even if not consumed.
> Pablo and I looked into this a bit, and he believed it would be possible in
> pvalue.py'sĀ
> DoOutputsTuple class. This fix would require callingĀ __getitem__ on all tags
> to initialize them properly. However, I had some trouble doing this, as this
> class is a bit strange since it overrides __getattr__. I found weird
> behaviors when adding functionality to this code. I don't really get how the
> code functions today, as its own instance variable usage should trigger the
> custom __getattr__ code, yet we seem to be using these attrs normally with
> self.X usages.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)