[GitHub] [beam] dpcollins-google commented on pull request #14081: [BEAM-11865] Add readMessagesWithAttributesWithCoderAndParseFn to PubSubIO

GitBox Fri, 26 Feb 2021 06:34:00 -0800


dpcollins-google commented on pull request #14081:
URL: https://github.com/apache/beam/pull/14081#issuecomment-786683119



   > The job graph is too large
   
   Pushing this into PubSubIO will not solve this issue. As you can see, the 
parse function is handled in exactly this way within the IO: 
https://github.com/apache/beam/blob/e81d12832ffc2a6ac0a87cd767f578993026a25c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L841
   
   What I'm suggesting is to do this in your user code:
   
   ```
   PCollection<PubsubMessage> messages = 
PubsubIO.readMessagesWithAttributesAndMessageId();
   PCollection<MyType> parsed = messages.apply(MapElements.into(new 
TypeDescriptor<MyType>() {}).via(MY_PARSER));
   ```
   
   This will have the exact same effect as the above and will not require 
additions to PubsubIO.
   
   Please see the guidance here 
https://cloud.google.com/dataflow/docs/guides/common-errors#job-graph-too-large 
for handling these errors. It is likely that in some way you are capturing a 
large amount of data in your parser (or another component), which will not be 
fixed by pushing the parser into PubsubIO.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] dpcollins-google commented on pull request #14081: [BEAM-11865] Add readMessagesWithAttributesWithCoderAndParseFn to PubSubIO

Reply via email to