[
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=339570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339570
]
ASF GitHub Bot logged work on BEAM-8554:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Nov/19 20:07
Start Date: 06/Nov/19 20:07
Worklog Time Spent: 10m
Work Description: stevekoonce commented on pull request #10013:
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013
…a WorkItem needs to be broken up
This implements the improvement described in
[BEAM-8554](https://issues.apache.org/jira/browse/BEAM-8554): when the
serialized size of a WorkItemCommitRequest proto is larger than the maximum
size, the commit request will be replaced by a request for a server-side
'truncation' which will cause the WorkItem itself to be broken up and, after
reprocessing, result in multiple, smaller WorkItemCommitRequests that are each
smaller and can be successfully submitted.
I updated an existing unit test and removed a redundant one - the
StreamingDataflowWorkerTest is already configured to run all tests with and
without StreamingEngine and Windmill, so separate, otherwise-identical tests
are not necessary.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more
tips on [how to make review process
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
XLang | --- | --- | --- | [](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/)
| --- | --- | ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 339570)
Remaining Estimate: 0h
Time Spent: 10m
> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to
> be broken up
> -----------------------------------------------------------------------------------------
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Steve Koonce
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and
> the request is still sent to the service. The service rejects the commit,
> but breaks up input messages that were bundled together and adds them to new,
> smaller work items that will later be pulled and re-tried - likely without
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is
> sent back to the service.
>
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to
> the service. Instead send flags in the commit request signaling that the
> current work item led to a commit that is too large and the work item should
> be broken up.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)