[
https://issues.apache.org/jira/browse/FLINK-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307480#comment-17307480
]
Aswin ram Sivaraman Venkataraman edited comment on FLINK-20945 at 3/24/21,
12:32 AM:
-------------------------------------------------------------------------------------
Can anybody please provide an update regarding this issue?.
We are currently using Flink Table API (Flink Version-1.12.0) to stream data
from Kafka and store it in Google Cloud Storage. The file format we are using
to store data is Parquet. Initially the Flink job worked perfectly fine and we
were able to stream data and store it successfully in Google Cloud Storage. But
what we noticed is, once we increase the cardinality of input data and also
increase the speed of data generated to Kafka i.e. stream more events per
second to Kafka, we noticed that the Flink Job throws the following errors:
** *GC Overlimit Exceeded*
# *Java Heap memory Out of Space- Error***
Initially I provided 4 gb each to Job Manager and Task Manager. I started the
flink’s yarn session with the following command:
./bin/yarn-session.sh -jm 4096m -tm 4096m -s 3
I am aware that, if we provide more memory to the task manager, we may not hit
issue. I want to know if there is a way for flink to take into consideration
the whole heap usage and flush data once the heap usage crosses a certain
threshold instead of hitting the *Out of memory Heap error*.
was (Author: aswinram92):
Can anybody please provide an update regarding this issue?.
We are currently using Flink Table API (Flink Version-1.12.0) to stream data
from Kafka and store it in Google Cloud Storage. The file format we are using
to store data is Parquet. Initially the Flink job worked perfectly fine and we
were able to stream data and store it successfully in Google Cloud Storage. But
what we noticed is, once we increase the cardinality of input data and also
increase the speed of data generated to Kafka i.e. stream more events per
second to Kafka, we noticed that the Flink Job throws the following errors:
** *GC Overlimit Exceeded*
# *Java Heap memory Out of Space- Error***
Initially I provided 4 gb each to Job Manager and Task Manager. I started the
flink’s yarn session with the following command:
./bin/yarn-session.sh -jm 4096m -tm 4096m -s 3
I am aware that, if we provide more memory to the task manager, we may not hit
issue. I want to know there is a way for flink to take into consideration the
whole heap usage and flush data once the heap usage crosses a certain threshold
instead of hitting the *Out of memory Heap error*.
> flink hive insert heap out of memory
> ------------------------------------
>
> Key: FLINK-20945
> URL: https://issues.apache.org/jira/browse/FLINK-20945
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Ecosystem
> Environment: flink 1.12.0
> hive-exec 2.3.5
> Reporter: Bruce GAO
> Priority: Major
>
> when using flink sql to insert into hive from kafka, heap out of memory
> occrus randomly.
> Hive table using year/month/day/hour as partition, it seems the max heap
> space needed is corresponded to active partition number(according to kafka
> message disordered and delay). which means if partition number increases, the
> heap space needed also increase, may cause the heap out of memory.
> when write record, is it possible to take the whole heap space usage into
> account in checkBlockSizeReached, or some other method to avoid OOM?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)