[jira] [Updated] (FLINK-26718) Limitations of flink+hive dimension table

kunghsu (Jira) Thu, 17 Mar 2022 23:38:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


kunghsu updated FLINK-26718:
----------------------------
    Labels: HIVE  (was: )

> Limitations of flink+hive dimension table
> -----------------------------------------
>
>                 Key: FLINK-26718
>                 URL: https://issues.apache.org/jira/browse/FLINK-26718
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.12.7
>            Reporter: kunghsu
>            Priority: Major
>              Labels: HIVE
>
> Limitations of flink+hive dimension table
> The scenario I am involved in is a join relationship between the Kafka input 
> table and the Hive dimension table. The hive dimension table is some user 
> data, and the data is very large.
> When the data volume of the hive table is small, about a few hundred rows, 
> everything is normal, the partition is automatically recognized and the 
> entire task is executed normally.
> When the hive table reached about 1.3 million, the TaskManager began to fail 
> to work properly. It was very difficult to even look at the log. I guess it 
> burst the JVM memory when it tried to load the entire table into memory. You 
> can see that a heartbeat timeout exception occurs in Taskmanager, such as 
> Heartbeat TimeoutException
> Official website documentation: 
> https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference
> So I have a question, does flink+hive not support association of large tables 
> so far?
> Is this solution unusable when the amount of data is too large?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26718) Limitations of flink+hive dimension table

Reply via email to