[
https://issues.apache.org/jira/browse/FLINK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014647#comment-16014647
]
Andrey commented on FLINK-6613:
-------------------------------
Hi Dmytro.
There are several problems still exist:
1) I want to fail fast and kill jvm if it spends too much time doing gc. So we
just can't remove "UseGCOverheadLimit" option.
2) Even if I remove this option, root cause will still exist. And the root
cause is: kafka integration incorrectly deals with large messages. It reads as
many messages as it could from kafka. And that will lead to OOM. GC settings
irrelevant here, because these messages should not and will not be eligible for
GC.
3) If you recommend G1, then default startup scripts should be changed.
Have you tried to reproduce issue?
> OOM during reading big messages from Kafka
> ------------------------------------------
>
> Key: FLINK-6613
> URL: https://issues.apache.org/jira/browse/FLINK-6613
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector
> Affects Versions: 1.2.0
> Reporter: Andrey
>
> Steps to reproduce:
> 1) Setup Task manager with 2G heap size
> 2) Setup job that reads messages from Kafka 10 (i.e. FlinkKafkaConsumer010)
> 3) Send 3300 messages each 635Kb. So total size is ~2G
> 4) OOM in task manager.
> According to heap dump:
> 1) KafkaConsumerThread read messages with total size ~1G.
> 2) Pass them to the next operator using
> org.apache.flink.streaming.connectors.kafka.internal.Handover
> 3) Then began to read another batch of messages.
> 4) Task manager was able to read next batch of ~500Mb messages until OOM.
> Expected:
> 1) Either have constraint like "number of messages in-flight" OR
> 2) Read next batch of messages only when previous batch processed OR
> 3) Any other option which will solve OOM.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)