[ 
https://issues.apache.org/jira/browse/BEAM-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated BEAM-8352:
----------------------------
    Status: Triage Needed  (was: Resolved)

> Reading records in background may lead to OOM errors
> ----------------------------------------------------
>
>                 Key: BEAM-8352
>                 URL: https://issues.apache.org/jira/browse/BEAM-8352
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kinesis
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 
> 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0
>            Reporter: Mateusz Juraszek
>            Assignee: Alexey Romanenko
>            Priority: P2
>             Fix For: 2.18.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We have faced a problem with OOM errors in our dataflow job containing 
> Kinesis sources. After investigation, it occurred that the issue was caused 
> by too many records being consumed by Kinesis sources that pipeline couldn't 
> handle in time.
> Looking into the Kinesis connector's code, the internal queue (recordsQueue) 
> that records are being put in the background is setup for each 
> ShardReadersPool (created for each source being Kinesis stream). The size of 
> the queue is set to `queueCapacityPerShard * number of shards`. The bigger 
> number of shards, the bigger queue size. There is no ability to limit the 
> maximum capacity of the queue (queueCapacityPerShard is also not configurable 
> and it's set to DEFAULT_CAPACITY_PER_SHARD=10_000). Additionally, there is no 
> differentiation on records size, so the size of data placed to the queue 
> might increase to the point where OOM will be thrown.
> It would be great to have ability to somehow limit the number of records that 
> are being read in the background to some sensible value. At the beginning, 
> simple solution would be to allow configuring max queue size for source at 
> the creation of KinesisIO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to