[
https://issues.apache.org/jira/browse/BEAM-11865?focusedWorklogId=558576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-558576
]
ASF GitHub Bot logged work on BEAM-11865:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Feb/21 14:33
Start Date: 26/Feb/21 14:33
Worklog Time Spent: 10m
Work Description: dpcollins-google commented on pull request #14081:
URL: https://github.com/apache/beam/pull/14081#issuecomment-786683119
> The job graph is too large
Pushing this into PubSubIO will not solve this issue. As you can see, the
parse function is handled in exactly this way within the IO:
https://github.com/apache/beam/blob/e81d12832ffc2a6ac0a87cd767f578993026a25c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L841
What I'm suggesting is to do this in your user code:
```
PCollection<PubsubMessage> messages =
PubsubIO.readMessagesWithAttributesAndMessageId();
PCollection<MyType> parsed = messages.apply(MapElements.into(new
TypeDescriptor<MyType>() {}).via(MY_PARSER));
```
This will have the exact same effect as the above and will not require
additions to PubsubIO.
Please see the guidance here
https://cloud.google.com/dataflow/docs/guides/common-errors#job-graph-too-large
for handling these errors. It is likely that in some way you are capturing a
large amount of data in your parser (or another component), which will not be
fixed by pushing the parser into PubsubIO.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 558576)
Time Spent: 50m (was: 40m)
> Add readMessagesWithAttributesWithCoderAndParseFn to the PubSubIO
> -----------------------------------------------------------------
>
> Key: BEAM-11865
> URL: https://issues.apache.org/jira/browse/BEAM-11865
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.28.0
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
> Priority: P2
> Fix For: 2.29.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)