[
https://issues.apache.org/jira/browse/BEAM-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252518#comment-17252518
]
Yimin Zhu commented on BEAM-4244:
---------------------------------
We have done all possible validation in the ParDo step in our best effort. The
problem still happens (and will happen in the future for sure) and it happened
between the interface between DF's Bigquery I/O client code and Bigquery's
streaming insertion API service which we have no control over at all. The
trouble with this issue is if that part of code in the BEAM is not happy, it
stalls the whole data plane. Since there is no callback (or alike) mechanism
for application logic intervene, application code can not do anything about it,
which is a poor design by any measurement.
If the goal is to push the responsibility to the application side, at least
providing a mechanism to allow that to happen.
> Provide a better way for programmatically handling errors raised while
> encoding/decoding data
> ---------------------------------------------------------------------------------------------
>
> Key: BEAM-4244
> URL: https://issues.apache.org/jira/browse/BEAM-4244
> Project: Beam
> Issue Type: New Feature
> Components: beam-model, runner-core
> Reporter: Chamikara Madhusanka Jayalath
> Priority: P3
>
> Beam runners use coders in various stages of a pipeline to encode/decode
> data. Coders are executed directly by the runner of a pipeline and user do
> not have control over exceptions raised during encoding/decoding (could be
> either due to malformed/corrupted data provided by users or intermediate
> malformed/corrupted data generated during the system execution).
> Currently users can rely on runner-specific worker logging to detect the
> error and update the pipeline but it would be better if we can provide a way
> to programmatically handle these errors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)