kennknowles opened a new issue, #19436: URL: https://github.com/apache/beam/issues/19436
I noticed that we are not able to convert the output tag****transform to the pcollection name for metrics (element count/mean byte count), if the Pcollections for the outputed tags are not consumed by a downstream step. This isn't critical as (1) Arguably there is no pcollection at all. (2) Output but not consumed PCollections are not critical to count metrics on as those can be optomized away entirely (No need to do any work, collect metrics, etc. for an unconsumed pcollection). However, we are able to count this, but we are unable to assign a pcollection name for it, as in this case there is no information about that output tag defined in the bundle descriptor. The alternative fix is to make sure that its always available, even if not consumed. Pablo and I looked into this a bit, and he believed it would be possible in pvalue.py'sĀ DoOutputsTuple class. This fix would require callingĀ __getitem__ on all tags to initialize them properly. However, I had some trouble doing this, as this class is a bit strange since it overrides __getattr__. I found weird behaviors when adding functionality to this code. I don't really get how the code functions today, as its own instance variable usage should trigger the custom __getattr__ code, yet we seem to be using these attrs normally with self.X usages. Imported from Jira [BEAM-7026](https://issues.apache.org/jira/browse/BEAM-7026). Original Jira may contain additional context. Reported by: [email protected]. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
