[
https://issues.apache.org/jira/browse/BEAM-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anonymous updated BEAM-8352:
----------------------------
Status: Triage Needed (was: Resolved)
> Reading records in background may lead to OOM errors
> ----------------------------------------------------
>
> Key: BEAM-8352
> URL: https://issues.apache.org/jira/browse/BEAM-8352
> Project: Beam
> Issue Type: Bug
> Components: io-java-kinesis
> Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0,
> 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0
> Reporter: Mateusz Juraszek
> Assignee: Alexey Romanenko
> Priority: P2
> Fix For: 2.18.0
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> We have faced a problem with OOM errors in our dataflow job containing
> Kinesis sources. After investigation, it occurred that the issue was caused
> by too many records being consumed by Kinesis sources that pipeline couldn't
> handle in time.
> Looking into the Kinesis connector's code, the internal queue (recordsQueue)
> that records are being put in the background is setup for each
> ShardReadersPool (created for each source being Kinesis stream). The size of
> the queue is set to `queueCapacityPerShard * number of shards`. The bigger
> number of shards, the bigger queue size. There is no ability to limit the
> maximum capacity of the queue (queueCapacityPerShard is also not configurable
> and it's set to DEFAULT_CAPACITY_PER_SHARD=10_000). Additionally, there is no
> differentiation on records size, so the size of data placed to the queue
> might increase to the point where OOM will be thrown.
> It would be great to have ability to somehow limit the number of records that
> are being read in the background to some sensible value. At the beginning,
> simple solution would be to allow configuring max queue size for source at
> the creation of KinesisIO.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)