[
https://issues.apache.org/jira/browse/BEAM-7812?focusedWorklogId=284526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-284526
]
ASF GitHub Bot logged work on BEAM-7812:
----------------------------------------
Author: ASF GitHub Bot
Created on: 29/Jul/19 20:41
Start Date: 29/Jul/19 20:41
Worklog Time Spent: 10m
Work Description: KevinGG commented on pull request #9187: [BEAM-7812]
Stackdriver Error Reporting Workaround
URL: https://github.com/apache/beam/pull/9187
**Please** add a meaningful description for your change here
1. Added a new static method to format exception stack trace in a format
that preserves the stack trace information while avoiding scan from
Stackdriver Error Reporting.
2. Fixed existing unit tests with this change.
3. Applied this new format to error reported from worker to Dataflow
backend.
The PR applies only to Dataflow runner, wouldn't change any existing
logs and log based features on Stackdriver and makes the error logged to
Dataflow backend consistently formatted between stream and batch pipelines.
Details see https://issues.apache.org/jira/browse/BEAM-7812.
Internally integration tested in Google for Dataflow runner.
------------------------
Thank you for your contribution! Follow this checklist to help us
incorporate your contribution quickly and easily:
- [ ] [**Choose
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [x] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | [](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 284526)
Time Spent: 10m
Remaining Estimate: 0h
> Work around Stackdriver error reporting double counting worker errors
> ---------------------------------------------------------------------
>
> Key: BEAM-7812
> URL: https://issues.apache.org/jira/browse/BEAM-7812
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Ning Kang
> Assignee: Ning Kang
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h1. *Objective*
> Work around Stackdriver Error Reporting to count worker errors only once when
> double logging.
> {color:#d04437}*Only applicable to dataflow runner workers in SDK*{color}.
> h1. *Background*
> Stackdriver error reporting will double count worker errors logged to
> Stackdriver, because:
> # workers log errors to Stackdriver;
> # workers report the same errors to dfe and dfe will log them again to
> Stackdriver.
> The double counting is blocking us sending job message logs from dfe to
> Stackdriver because we don't want to change the behavior of any existing log
> and feature.
> There happens to be an inconsistency in Java batch
> [DataflowWorkerLoggingHandler|[https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingHandler.java#L82]]
> and streaming
> ([StreamingDataflowWorker|[https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L1747]])
> error reporting to dfe that results in reported error from streaming Java
> worker will eventually be ignored by Stackdriver Error Reporting.
> h1. *Details*
> Inspired by the inconsistency, we decide to apply the streaming Java worker
> error reporting logic to batch to both fix the inconsistency and work around
> double counting issue on Stackdriver Error Reporting.
> The change will be when workers reporting errors to dfe,
> * For Java, construct stack trace from StackTrace object instead of using
> printStackTrace;
> * For Python, report the complete error message details exactly the same to
> worker logging instead of only reporting traceback through traceback module.
> Users will not experience change since job message logging to Stackdriver
> hasn’t been launched yet.
> h1. *Test Plan*
> We'll add unit test for public methods changed in the process.
> Google has internal integration tests where we can push worker harness images
> and set worker harness container image to test in sandbox.
> When releasing, we also have integration tests in different releasing stages.
> The workaround needs to be released completely before we can enable job
> message logging.
> We can verify the format of stacktraces in sandbox and release stages by
> executing example pipelines in our projects and directly browse prod
> Stackdriver logging and error reporting consoles. This should be done before
> and after enabling job message logging.
> Run any other existing and required tests before sending PR.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)