[ 
https://issues.apache.org/jira/browse/BEAM-7949?focusedWorklogId=355980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-355980
 ]

ASF GitHub Bot logged work on BEAM-7949:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Dec/19 07:38
            Start Date: 09/Dec/19 07:38
    Worklog Time Spent: 10m 
      Work Description: sunjincheng121 commented on issue #10246: [BEAM-7949] 
Add time-based cache threshold support in the data service of the Python SDK 
harness
URL: https://github.com/apache/beam/pull/10246#issuecomment-563105146
 
 
   Hi @mxm,  definitely agree with you that the bundle timeout could be lower 
for latency. However, I'm not sure if it's the best way for all the use cases / 
users as it has some overhead of finishing a bundle in my mind, i.e. all the 
states cached in the SDK harness will be flushed back to the runner if a bundle 
is finished. The lower the bundle timeout is set, the more unnecessary state 
traffic between runner and SDK harness will be introduced.
   
   The solution proposed in this PR (periodic flush) can avoid such problems 
while still lowering the latency.
   
   The time-based cache threshold has been supported in the Java data service 
in #9949. This PR tries to add similar functionality for the data service of 
the Python SDK harness.
   
   What do you think?
   
   Best,
   Jincheng
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 355980)
    Time Spent: 50m  (was: 40m)

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-7949
>                 URL: https://issues.apache.org/jira/browse/BEAM-7949
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-harness
>            Reporter: sunjincheng
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to