[
https://issues.apache.org/jira/browse/FLINK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maximilian Michels updated FLINK-31400:
---------------------------------------
Description:
A very critical part in the scaling algorithm is setting the source processing
rate correctly such that the Flink pipeline can keep up with the ingestion
rate. The autoscaler does that by looking at the {{pendingRecords}} Flink
source metric. Even if that metric is not available, the source can still be
sized according to the busyTimeMsPerSecond metric, but there will be no backlog
information available. For Kafka, the autoscaler also determines the number of
partitions to avoid scaling higher than the maximum number of partitions.
In order to support a wider range of use cases, we should investigate an
integration with the Iceberg source. As far as I know, it does not expose the
pedingRecords metric, nor does the autoscaler know about other constraints,
e.g. the maximum number of open files.
was:
A very critical part in the scaling algorithm is setting the source processing
correctly such that the Flink pipeline can keep up with the ingestion rate. The
autoscaler does that by looking at the {{pendingRecords}} Flink source metric.
Even if that metric is not available, the source can still be sized according
to the busyTimeMsPerSecond metric, but there will be no backlog information
available. For Kafka, the autoscaler also determines the number of partitions
to avoid scaling higher than the maximum number of partitions.
In order to support a wider range of use cases, we should investigate an
integration with the Iceberg source. As far as I know, it does not expose the
pedingRecords metric, nor does the autoscaler know about other constraints,
e.g. the maximum number of open files.
> Add autoscaler integration for Iceberg source
> ---------------------------------------------
>
> Key: FLINK-31400
> URL: https://issues.apache.org/jira/browse/FLINK-31400
> Project: Flink
> Issue Type: New Feature
> Components: Autoscaler, Kubernetes Operator
> Reporter: Maximilian Michels
> Priority: Major
> Fix For: kubernetes-operator-1.5.0
>
>
> A very critical part in the scaling algorithm is setting the source
> processing rate correctly such that the Flink pipeline can keep up with the
> ingestion rate. The autoscaler does that by looking at the {{pendingRecords}}
> Flink source metric. Even if that metric is not available, the source can
> still be sized according to the busyTimeMsPerSecond metric, but there will be
> no backlog information available. For Kafka, the autoscaler also determines
> the number of partitions to avoid scaling higher than the maximum number of
> partitions.
> In order to support a wider range of use cases, we should investigate an
> integration with the Iceberg source. As far as I know, it does not expose the
> pedingRecords metric, nor does the autoscaler know about other constraints,
> e.g. the maximum number of open files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)