[
https://issues.apache.org/jira/browse/FLINK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kunghsu updated FLINK-26718:
----------------------------
Labels: HIVE (was: )
> Limitations of flink+hive dimension table
> -----------------------------------------
>
> Key: FLINK-26718
> URL: https://issues.apache.org/jira/browse/FLINK-26718
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hive
> Affects Versions: 1.12.7
> Reporter: kunghsu
> Priority: Major
> Labels: HIVE
>
> Limitations of flink+hive dimension table
> The scenario I am involved in is a join relationship between the Kafka input
> table and the Hive dimension table. The hive dimension table is some user
> data, and the data is very large.
> When the data volume of the hive table is small, about a few hundred rows,
> everything is normal, the partition is automatically recognized and the
> entire task is executed normally.
> When the hive table reached about 1.3 million, the TaskManager began to fail
> to work properly. It was very difficult to even look at the log. I guess it
> burst the JVM memory when it tried to load the entire table into memory. You
> can see that a heartbeat timeout exception occurs in Taskmanager, such as
> Heartbeat TimeoutException
> Official website documentation:
> https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference
> So I have a question, does flink+hive not support association of large tables
> so far?
> Is this solution unusable when the amount of data is too large?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)