[ 
https://issues.apache.org/jira/browse/BEAM-11629?focusedWorklogId=564989&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-564989
 ]

ASF GitHub Bot logged work on BEAM-11629:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Mar/21 22:27
            Start Date: 11/Mar/21 22:27
    Worklog Time Spent: 10m 
      Work Description: aaltay commented on pull request #13739:
URL: https://github.com/apache/beam/pull/13739#issuecomment-797091519


   > At this point I'm not sure what to do with it - I hoped for some 
suggestions. To summarize my points:
   > 
   > 1. We need some way to prevent attaching windowing when the user doesn't 
ever need it - I'm still not sure what's the right way to determine this, hence 
I'm using the setting now, so it shouldn't break anyone
   > 2. Windowing information is excessively large - that's not a part of this 
PR. I'm also not familiar with it enough to come up with a schema for compact 
storage.
   
   Got it. 
   
   I will defer to @rohdesamuel or other reviewers on this thread for your 
questions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 564989)
    Time Spent: 4h  (was: 3h 50m)

> Optimize the cache storage for InteractiveRunner
> ------------------------------------------------
>
>                 Key: BEAM-11629
>                 URL: https://issues.apache.org/jira/browse/BEAM-11629
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-py-interactive
>            Reporter: Dmytro Kozhevin
>            Assignee: Dmytro Kozhevin
>            Priority: P2
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently InteractiveRunner wraps every record of the cached PCollection into 
> WindowedValue. There is 2 problems about this:
> 1) The windowing information is unnecessary for the batch-mode runs 
> (everything is in the same global window).
> 2) Since the cache is stored as text, we pickle the WindowedValue, which adds 
> ~500 bytes of data to every record (e.g. a cache of just 1000000 integers 
> would take ~500MB instead of ~4MB).
> These issues significantly slow down the interactive runs for data with lots 
> of small rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to