[
https://issues.apache.org/jira/browse/BEAM-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097476#comment-16097476
]
ASF GitHub Bot commented on BEAM-2660:
--------------------------------------
GitHub user cjmcgraw opened a pull request:
https://github.com/apache/beam/pull/3619
[BEAM-2660] Set PubsubIO batch size using builder
BEAM-2660 asks for controlling batch size using the `PubsubIO.Write.Builder`
This PR adds Two values configurable through the `PubsubIO.Write.Builder`:
- `maxBatchSize` - controls the bulk batch request size
- `maxBatchByteSize` - controls the bulk batch bytes request size
In this PR I have also made a modification to the
`PubsubIO.Write.PubsubBoundedWriter`. Now the writer will dynamically track the
number of bytes allocated for all messages. If the number of bytes exceeds the
threshold it will publish before adding more messages.
If the message size exceeds the `maxBatchByteSize` then an exception will
be thrown
An example use case of the new parameter is:
```java
PubsubIO.writeMessages()
.withMaxBatchSize(100)
.withMaxBatchByteSize(100000)
.to("my-topic")
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cjmcgraw/beam update-pubsubIO
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3619.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3619
----
commit 7eff3ea99da3fad85e10ac50c2b2bc6fec89a1fc
Author: Carl McGraw <[email protected]>
Date: 2017-07-22T22:30:40Z
Added maxPublishBatchSize parameter to PubsubBoundedWriter class.
commit 95f23cd98c2008e0f5712ed68036bfb71caaa144
Author: Carl McGraw <[email protected]>
Date: 2017-07-23T00:30:18Z
updated BoundedPubsubWriter to dynamically flush if queued messages exceed
a pre-defined maximum batch byte size
commit c2abeb926c71bf21bbcc9406986c340d2c9d63e0
Author: Carl McGraw <[email protected]>
Date: 2017-07-23T01:17:03Z
updated UnboundedPubsubSink to accept new parameters.
----
> Set PubsubIO batch size using builder
> -------------------------------------
>
> Key: BEAM-2660
> URL: https://issues.apache.org/jira/browse/BEAM-2660
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-gcp
> Reporter: Carl McGraw
> Assignee: Stephen Sisk
> Labels: gcp, java, pubsub, sdk
>
> PubsubIO doesn't allow users to set the publish batch size. Instead the value
> is hard coded in both the BoundedPubsubWriter and the UnboundedPubsubSink.
> google's pub/sub is bound to a maximum of 10mb per request size. My company
> has run into problems with events that are individually smaller than 1mb, but
> when batched in the 100 or 2000 default batch sizes causes pubsub to fail to
> send the event.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)