roland created FLINK-35118:
------------------------------
Summary: StreamingHiveSource cannot track tables that have more
than 32,767 partitions
Key: FLINK-35118
URL: https://issues.apache.org/jira/browse/FLINK-35118
Project: Flink
Issue Type: Bug
Components: Connectors / Hive
Affects Versions: 1.19.0
Reporter: roland
*Description:*
The Streaming Hive Source cannot track tables that have more than 32,767
partitions.
*Root Cause:*
The Streaming Hive Source uses the following lines to get all partitions of a
table:
([git hub
link|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/read/HivePartitionFetcherContextBase.java#L130])
HivePartitionFetcherContextBase.java:
{code:java}
@Override
public List<ComparablePartitionValue> getComparablePartitionValueList()
throws Exception {
List<ComparablePartitionValue> partitionValueList = new ArrayList<>();
switch (partitionOrder) {
case PARTITION_NAME:
List<String> partitionNames =
metaStoreClient.listPartitionNames(
tablePath.getDatabaseName(),
tablePath.getObjectName(),
Short.MAX_VALUE);
for (String partitionName : partitionNames) {
partitionValueList.add(getComparablePartitionByName(partitionName));
}
break;
case CREATE_TIME:
Map<List<String>, Long> partValuesToCreateTime = new
HashMap<>();
partitionNames =
metaStoreClient.listPartitionNames(
tablePath.getDatabaseName(),
tablePath.getObjectName(),
Short.MAX_VALUE); {code}
Where the `metaStoreClient` is a wrapper of the `IMetaStoreClient`, and the
function `listPartitionNames` can only list no more than `Short.MAX_VALUE`
partitions, which is 32,767.
For tables that have more partitions, the source fails to track new partitions
and read from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)