[
https://issues.apache.org/jira/browse/BEAM-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056822#comment-17056822
]
Minbo Bae commented on BEAM-9483:
---------------------------------
Actually, it has an issue with _PubsubIO.__readMessagesWithMessageId_. I see
the following error.
{quote}exception: "java.io.EOFException: reached end of stream after reading 4
bytes; 72 bytes expectedexception: "java.io.EOFException: reached end of stream
after reading 4 bytes; 72 bytes expected
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams.readFully(ByteStreams.java:780)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteStreams.readFully(ByteStreams.java:762)
at org.apache.beam.sdk.coders.ByteArrayCoder.decode(ByteArrayCoder.java:108)
at org.apache.beam.sdk.coders.ByteArrayCoder.decode(ByteArrayCoder.java:95)
at org.apache.beam.sdk.coders.ByteArrayCoder.decode(ByteArrayCoder.java:41)
at
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageWithMessageIdCoder.decode(PubsubMessageWithMessageIdCoder.java:50)
at
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessageWithMessageIdCoder.decode(PubsubMessageWithMessageIdCoder.java:33)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at
org.apache.beam.runners.dataflow.worker.PubsubReader$PubsubReaderIterator.decodeMessage(PubsubReader.java:129)
at
org.apache.beam.runners.dataflow.worker.WindmillReaderIteratorBase.advance(WindmillReaderIteratorBase.java:57)
at
org.apache.beam.runners.dataflow.worker.WindmillReaderIteratorBase.start(WindmillReaderIteratorBase.java:43)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
at
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1324)
at
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151)
at
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{quote}
As mentioned in the description,
_PubsubIO.readMessagesWithAttributesAndMessageId_ is a workaround for the users
who needs Pubsub message id.
> PubsubIO.readMessagesWithAttributesWithMessageId does not work with streaming
> dataflow pipeline
> -----------------------------------------------------------------------------------------------
>
> Key: BEAM-9483
> URL: https://issues.apache.org/jira/browse/BEAM-9483
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: Minor
>
> https://github.com/apache/beam/blob/d84b80b58274dc1680e61d536d7bf2077f70d6c9/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1257
> needs to indicate that attibutes are required for the dataflow runner to send
> the full pubsub message with message id.
> A work around is to use PubsubIO.readMessagesWithAttributesWithMessageId
> instead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)