kunghsu created FLINK-26718:
-------------------------------
Summary: Limitations of flink+hive dimension table
Key: FLINK-26718
URL: https://issues.apache.org/jira/browse/FLINK-26718
Project: Flink
Issue Type: Bug
Components: Connectors / Hive
Affects Versions: 1.12.7
Reporter: kunghsu
Limitations of flink+hive dimension table
The scenario I am involved in is a join relationship between the Kafka input
table and the Hive dimension table. The hive dimension table is some user data,
and the data is very large.
When the data volume of the hive table is small, about a few hundred rows,
everything is normal, the partition is automatically recognized and the entire
task is executed normally.
When the hive table reached about 1.3 million, the TaskManager began to fail to
work properly. It was very difficult to even look at the log. I guess it burst
the JVM memory when it tried to load the entire table into memory. You can see
that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat
TimeoutException
Official website documentation:
https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference
So I have a question, does flink+hive not support association of large tables
so far?
Is this solution unusable when the amount of data is too large?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)