[
https://issues.apache.org/jira/browse/BEAM-7442?focusedWorklogId=250317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-250317
]
ASF GitHub Bot logged work on BEAM-7442:
----------------------------------------
Author: ASF GitHub Bot
Created on: 29/May/19 17:36
Start Date: 29/May/19 17:36
Worklog Time Spent: 10m
Work Description: mxm commented on pull request #8715:
[BEAM-7442][BEAM-5650] Read sequentially from bounded sources in
UnboundedSourceWrapper
URL: https://github.com/apache/beam/pull/8715
Bounded sources are converted to unbounded sources using the
BoundedToUnboundedSourceAdapter. This applies the streaming semantics for
unbounded sources, which is to read round-robin from all sources after
splitting. This doesn't work particularly well with a large number of files
and
leads to a large number of open file handles / connections.
Instead, we can simple read the bounded sources sequentially.
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
<br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 250317)
Time Spent: 10m
Remaining Estimate: 0h
> Bounded Reads for Flink Runner fails with OOM
> ---------------------------------------------
>
> Key: BEAM-7442
> URL: https://issues.apache.org/jira/browse/BEAM-7442
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Akshay Iyangar
> Assignee: Maximilian Michels
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When Flink runner is reading from a bounded source and if the total number of
> files are huge and the count is more. FlinkRunner throws an OOM error. This
> is happening because the current implementation doesn't read them
> sequentially but simultaneously thus causing all of the files to be in memory
> which quickly breaks the cluster.
> Solution : To wrap `UnboundedReadFromBoundedSource` class by a wrapper to see
> that when the stream is a bounded source we make it read it sequentially
> using a queue.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
