dpcollins-google commented on pull request #14081: URL: https://github.com/apache/beam/pull/14081#issuecomment-786683119
> The job graph is too large Pushing this into PubSubIO will not solve this issue. As you can see, the parse function is handled in exactly this way within the IO: https://github.com/apache/beam/blob/e81d12832ffc2a6ac0a87cd767f578993026a25c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L841 What I'm suggesting is to do this in your user code: ``` PCollection<PubsubMessage> messages = PubsubIO.readMessagesWithAttributesAndMessageId(); PCollection<MyType> parsed = messages.apply(MapElements.into(new TypeDescriptor<MyType>() {}).via(MY_PARSER)); ``` This will have the exact same effect as the above and will not require additions to PubsubIO. Please see the guidance here https://cloud.google.com/dataflow/docs/guides/common-errors#job-graph-too-large for handling these errors. It is likely that in some way you are capturing a large amount of data in your parser (or another component), which will not be fixed by pushing the parser into PubsubIO. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
