[ 
https://issues.apache.org/jira/browse/FLINK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kunghsu updated FLINK-26718:
----------------------------
    Description: 
Limitations of flink+hive dimension table

The scenario I am involved in is a join relationship between the Kafka input 
table and the Hive dimension table. The hive dimension table is some user data, 
and the data is very large.
When the data volume of the hive table is small, about a few hundred rows, 
everything is normal, the partition is automatically recognized and the entire 
task is executed normally.

When the hive table reached about 1.3 million, the TaskManager began to fail to 
work properly. It was very difficult to even look at the log. I guess it burst 
the JVM memory when it tried to load the entire table into memory. You can see 
that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat 
TimeoutException.I even increased the parallelism to no avail.

Official website documentation: 
[https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference]

So I have a question, does flink+hive not support association of large tables 
so far?

Is this solution unusable when the amount of data is too large?

  was:
Limitations of flink+hive dimension table


The scenario I am involved in is a join relationship between the Kafka input 
table and the Hive dimension table. The hive dimension table is some user data, 
and the data is very large.
When the data volume of the hive table is small, about a few hundred rows, 
everything is normal, the partition is automatically recognized and the entire 
task is executed normally.


When the hive table reached about 1.3 million, the TaskManager began to fail to 
work properly. It was very difficult to even look at the log. I guess it burst 
the JVM memory when it tried to load the entire table into memory. You can see 
that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat 
TimeoutException


Official website documentation: 
https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference

So I have a question, does flink+hive not support association of large tables 
so far?

Is this solution unusable when the amount of data is too large?


> Limitations of flink+hive dimension table
> -----------------------------------------
>
>                 Key: FLINK-26718
>                 URL: https://issues.apache.org/jira/browse/FLINK-26718
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.12.7
>            Reporter: kunghsu
>            Priority: Major
>              Labels: HIVE
>
> Limitations of flink+hive dimension table
> The scenario I am involved in is a join relationship between the Kafka input 
> table and the Hive dimension table. The hive dimension table is some user 
> data, and the data is very large.
> When the data volume of the hive table is small, about a few hundred rows, 
> everything is normal, the partition is automatically recognized and the 
> entire task is executed normally.
> When the hive table reached about 1.3 million, the TaskManager began to fail 
> to work properly. It was very difficult to even look at the log. I guess it 
> burst the JVM memory when it tried to load the entire table into memory. You 
> can see that a heartbeat timeout exception occurs in Taskmanager, such as 
> Heartbeat TimeoutException.I even increased the parallelism to no avail.
> Official website documentation: 
> [https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference]
> So I have a question, does flink+hive not support association of large tables 
> so far?
> Is this solution unusable when the amount of data is too large?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to