[
https://issues.apache.org/jira/browse/BEAM-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318891#comment-17318891
]
Beam JIRA Bot commented on BEAM-11629:
--------------------------------------
This issue is assigned but has not received an update in 30 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> Optimize the cache storage for InteractiveRunner
> ------------------------------------------------
>
> Key: BEAM-11629
> URL: https://issues.apache.org/jira/browse/BEAM-11629
> Project: Beam
> Issue Type: Improvement
> Components: runner-py-interactive
> Reporter: Dmytro Kozhevin
> Assignee: Dmytro Kozhevin
> Priority: P2
> Labels: stale-assigned
> Time Spent: 4h
> Remaining Estimate: 0h
>
> Currently InteractiveRunner wraps every record of the cached PCollection into
> WindowedValue. There is 2 problems about this:
> 1) The windowing information is unnecessary for the batch-mode runs
> (everything is in the same global window).
> 2) Since the cache is stored as text, we pickle the WindowedValue, which adds
> ~500 bytes of data to every record (e.g. a cache of just 1000000 integers
> would take ~500MB instead of ~4MB).
> These issues significantly slow down the interactive runs for data with lots
> of small rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)