[
https://issues.apache.org/jira/browse/SAMZA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Georgantas updated SAMZA-2044:
------------------------------------
Description:
The contract of the handleEndOfStream method dictates that a collection of
results be returned. In the case of WindowOperatorImpl which has a backing
RocksDB store, this effectively causes the entirety of the store to be pulled
into memory. In many cases, this will cause out of memory exceptions (otherwise
why not keep the store in memory in the first place).
Since this is a protected api, I have a relatively simple change to propose
which could allow data to be consumed downstream as it is brought out of the
store:
{{protected void handleEndOfStream(Consumer<WindowPane<K, Object>> consumer,
MessageCollector collector, TaskCoordinator coordinator)}}
was:
The contract of the handleEndOfStream method dictates that a collection of
results be returned. In the case of WindowOperatorImpl which has a backing
RocksDB store, this effectively causes the entirety of the store to be pulled
into memory. In many cases, this will cause out of memory exceptions (otherwise
why not keep the store in memory in the first place).
Since this is a protected api, I have a relatively simple change to propose
which could allow data to be consumed downstream as it is brought out of the
store:
{\{protected void handleEndOfStream(Consumer<WindowPane<K, Object>> consumer,
MessageCollector collector, TaskCoordinator coordinator)}}
> EOSMessage causes out of memory Exceptions related to WindowOperatorImpl
> ------------------------------------------------------------------------
>
> Key: SAMZA-2044
> URL: https://issues.apache.org/jira/browse/SAMZA-2044
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.14.0, 1.0
> Reporter: Peter Georgantas
> Priority: Major
>
> The contract of the handleEndOfStream method dictates that a collection of
> results be returned. In the case of WindowOperatorImpl which has a backing
> RocksDB store, this effectively causes the entirety of the store to be pulled
> into memory. In many cases, this will cause out of memory exceptions
> (otherwise why not keep the store in memory in the first place).
>
> Since this is a protected api, I have a relatively simple change to propose
> which could allow data to be consumed downstream as it is brought out of the
> store:
> {{protected void handleEndOfStream(Consumer<WindowPane<K, Object>> consumer,
> MessageCollector collector, TaskCoordinator coordinator)}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)