[ 
https://issues.apache.org/jira/browse/FLINK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-31400:
---------------------------------------
    Description: 
A very critical part in the scaling algorithm is setting the source processing 
rate correctly such that the Flink pipeline can keep up with the ingestion 
rate. The autoscaler does that by looking at the {{pendingRecords}} Flink 
source metric. Even if that metric is not available, the source can still be 
sized according to the busyTimeMsPerSecond metric, but there will be no backlog 
information available. For Kafka, the autoscaler also determines the number of 
partitions to avoid scaling higher than the maximum number of partitions.

In order to support a wider range of use cases, we should investigate an 
integration with the Iceberg source. As far as I know, it does not expose the 
pedingRecords metric, nor does the autoscaler know about other constraints, 
e.g. the maximum number of open files.

  was:
A very critical part in the scaling algorithm is setting the source processing 
correctly such that the Flink pipeline can keep up with the ingestion rate. The 
autoscaler does that by looking at the {{pendingRecords}} Flink source metric. 
Even if that metric is not available, the source can still be sized according 
to the busyTimeMsPerSecond metric, but there will be no backlog information 
available. For Kafka, the autoscaler also determines the number of partitions 
to avoid scaling higher than the maximum number of partitions.

In order to support a wider range of use cases, we should investigate an 
integration with the Iceberg source. As far as I know, it does not expose the 
pedingRecords metric, nor does the autoscaler know about other constraints, 
e.g. the maximum number of open files.


> Add autoscaler integration for Iceberg source
> ---------------------------------------------
>
>                 Key: FLINK-31400
>                 URL: https://issues.apache.org/jira/browse/FLINK-31400
>             Project: Flink
>          Issue Type: New Feature
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Priority: Major
>             Fix For: kubernetes-operator-1.5.0
>
>
> A very critical part in the scaling algorithm is setting the source 
> processing rate correctly such that the Flink pipeline can keep up with the 
> ingestion rate. The autoscaler does that by looking at the {{pendingRecords}} 
> Flink source metric. Even if that metric is not available, the source can 
> still be sized according to the busyTimeMsPerSecond metric, but there will be 
> no backlog information available. For Kafka, the autoscaler also determines 
> the number of partitions to avoid scaling higher than the maximum number of 
> partitions.
> In order to support a wider range of use cases, we should investigate an 
> integration with the Iceberg source. As far as I know, it does not expose the 
> pedingRecords metric, nor does the autoscaler know about other constraints, 
> e.g. the maximum number of open files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to