kunghsu created FLINK-26718:
-------------------------------

             Summary: Limitations of flink+hive dimension table
                 Key: FLINK-26718
                 URL: https://issues.apache.org/jira/browse/FLINK-26718
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Hive
    Affects Versions: 1.12.7
            Reporter: kunghsu


Limitations of flink+hive dimension table


The scenario I am involved in is a join relationship between the Kafka input 
table and the Hive dimension table. The hive dimension table is some user data, 
and the data is very large.
When the data volume of the hive table is small, about a few hundred rows, 
everything is normal, the partition is automatically recognized and the entire 
task is executed normally.


When the hive table reached about 1.3 million, the TaskManager began to fail to 
work properly. It was very difficult to even look at the log. I guess it burst 
the JVM memory when it tried to load the entire table into memory. You can see 
that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat 
TimeoutException


Official website documentation: 
https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference

So I have a question, does flink+hive not support association of large tables 
so far?

Is this solution unusable when the amount of data is too large?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to