[jira] [Comment Edited] (FLINK-20945) flink hive insert heap out of memory

Aswin ram Sivaraman Venkataraman (Jira) Tue, 23 Mar 2021 17:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307480#comment-17307480
 ]


Aswin ram Sivaraman Venkataraman edited comment on FLINK-20945 at 3/24/21, 
12:32 AM:
-------------------------------------------------------------------------------------

Can anybody please  provide an update regarding this issue?.

We are currently using Flink Table API (Flink Version-1.12.0) to stream data 
from Kafka and store it in Google Cloud Storage. The file format we are using 
to store data is Parquet. Initially the Flink job worked perfectly fine and we 
were able to stream data and store it successfully in Google Cloud Storage. But 
what we noticed is, once we increase the cardinality of input data and also 
increase the speed of data generated to Kafka i.e. stream more events per 
second to Kafka, we noticed that the Flink Job throws the following errors:

 ** *GC Overlimit Exceeded*
 # *Java Heap memory Out of Space- Error***

Initially I provided 4 gb each to Job Manager and Task Manager. I started the 
flink’s yarn session with the following command:

./bin/yarn-session.sh -jm 4096m -tm 4096m -s 3

I am aware that, if we provide more memory to the task manager, we may not hit 
issue. I want to know if there is a way for flink to take into consideration 
the whole heap usage and flush data once the heap usage crosses a certain 
threshold instead of hitting the *Out of memory Heap error*.


was (Author: aswinram92):
Can anybody please  provide an update regarding this issue?.

We are currently using Flink Table API (Flink Version-1.12.0) to stream data 
from Kafka and store it in Google Cloud Storage. The file format we are using 
to store data is Parquet. Initially the Flink job worked perfectly fine and we 
were able to stream data and store it successfully in Google Cloud Storage. But 
what we noticed is, once we increase the cardinality of input data and also 
increase the speed of data generated to Kafka i.e. stream more events per 
second to Kafka, we noticed that the Flink Job throws the following errors:

 ** *GC Overlimit Exceeded*
 #  *Java Heap memory Out of Space- Error***

Initially I provided 4 gb each to Job Manager and Task Manager. I started the 
flink’s yarn session with the following command:

./bin/yarn-session.sh -jm 4096m -tm 4096m -s 3

I am aware that, if we provide more memory to the task manager, we may not hit 
issue. I want to know there is a way for flink to take into consideration the 
whole heap usage and flush data once the heap usage crosses a certain threshold 
instead of hitting the *Out of memory Heap error*.

> flink hive insert heap out of memory
> ------------------------------------
>
>                 Key: FLINK-20945
>                 URL: https://issues.apache.org/jira/browse/FLINK-20945
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Ecosystem
>         Environment: flink 1.12.0 
> hive-exec 2.3.5
>            Reporter: Bruce GAO
>            Priority: Major
>
> when using flink sql to insert into hive from kafka, heap out of memory 
> occrus randomly.
> Hive table using year/month/day/hour as partition,  it seems the max heap 
> space needed is corresponded to active partition number(according to kafka 
> message disordered and delay). which means if partition number increases, the 
> heap space needed also increase, may cause the heap out of memory.
> when write record, is it possible to take the whole heap space usage into 
> account in checkBlockSizeReached, or some other method to avoid OOM?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-20945) flink hive insert heap out of memory

Reply via email to