[
https://issues.apache.org/jira/browse/FLINK-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855488#comment-15855488
]
Tzu-Li (Gordon) Tai edited comment on FLINK-5583 at 2/7/17 7:57 AM:
--------------------------------------------------------------------
The collector can be made thread-safe yes, depending on the implementation. But
for the synchronization to work properly, user implementations of
{{deserialize(bytes, collector)}} will need to make sure that all outputs are
added to the collector before returning from the method.
Also note that for the underlying implementation, it might differ between 0.8
and 0.9+, mainly due to the different threading models for how partitions are
consumed.
For 0.8 I think we need to have a separate collector for each subscribed Kafka
partition, while in 0.9, we can have a single collector for the whole subtask.
was (Author: tzulitai):
The collector can be made thread-safe yes, depending on the implementation. But
for the synchronization to work properly, user implementations of
{{deserialize(bytes, collector)}} will need to make sure that all outputs are
added to the collector before returning from the method.
Also note that for the underlying implementation I think we should have a
separate collector for each subscribed Kafka partition. A collector cannot be
shared for multiple partitions.
> Support flexible error handling in the Kafka consumer
> -----------------------------------------------------
>
> Key: FLINK-5583
> URL: https://issues.apache.org/jira/browse/FLINK-5583
> Project: Flink
> Issue Type: Improvement
> Components: Kafka Connector
> Reporter: Haohui Mai
> Assignee: Haohui Mai
>
> We found that it is valuable to allow the applications to handle errors and
> exceptions in the Kafka consumer in order to build a robust application in
> production.
> The context is the following:
> (1) We have schematized, Avro records flowing through Kafka.
> (2) The decoder implements the DeserializationSchema to decode the records.
> (3) Occasionally there are corrupted records (e.g., schema issues). The
> streaming pipeline might want to bail out (which is the current behavior) or
> to skip the corrupted records depending on the applications.
> Two options are available:
> (1) Have a variant of DeserializationSchema to return a FlatMap like
> structure as suggested in FLINK-3679.
> (2) Allow the applications to catch and handle the exception by exposing some
> APIs that are similar to the {{ExceptionProxy}}.
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)