[jira] [Commented] (STORM-3510) WorkerState.transferLocalBatch backpressure resend logic fix

Christopher Johnson (Jira) Thu, 19 Sep 2019 06:40:17 -0700


    [ 
https://issues.apache.org/jira/browse/STORM-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933387#comment-16933387
 ]


Christopher Johnson commented on STORM-3510:
--------------------------------------------

My code for solution 2 is in this pull request:

[https://github.com/apache/storm/pull/3131]

> WorkerState.transferLocalBatch backpressure resend logic fix
> ------------------------------------------------------------
>
>                 Key: STORM-3510
>                 URL: https://issues.apache.org/jira/browse/STORM-3510
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-client
>    Affects Versions: 2.2.0
>            Reporter: Christopher Johnson
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> WorkerState.transferLocalBatch uses an int lastOverflowCount to track the 
> size of the overflow queue, and periodically resend the backpressure status 
> to remote workers if the queue continues to grow.
>  
> The current implementation has two problems:
>  * The single variable tracks the receive queue of every executor in the 
> worker, meaning it will be overwritten as tuples are sent to different 
> executors.
>  * The variable is locally scoped, and so is not carried over between 
> mini-batches.
>  
> This only comes in to effect when the overflow queue grows beyond 10000, 
> which shouldn't happen unless a backpressure signal isn't received by an 
> upstream worker, but if it does happen then a backpressure signal is going to 
> be sent for every mini-batch processed.  I do not know if this is the 
> intended behaviour, but the way the code is written seems to indicate that it 
> isn't.
>  
> I have thought of two redesigns to fix these problems and make the behaviour 
> align with how one would interpret the code:
>  
>  #  *Change the lastOverflowCount variable to a map of taskId to overflow 
> count* - This will retain the behaviour of resending the backpressure update 
> every mini-batch once over the threshold, if that behaviour is intended.  
> However, it will increase garbage by creating a new map every time 
> WorkerState.transferLocalBatch is called by the NettyWorker thread.
>  # *Change the lastOverflowCount variable to a map of taskId to overflow 
> count* *and move it to the BackPressureTracker class* - This will retain the 
> counter between batches, and so only resend backpressure status every 10000 
> received tuples per task.
>  
> My preference is for the second option, as if the intended behaviour is to 
> resend every mini batch it should be rewritten so the intent is explicit from 
> the code.
>  
> It is also possible that doing it the second way could run in to concurrency 
> issues i didn't think of, but as far as i can tell the 
> topology.worker.receiver.thread.count config option isn't used at all?  If 
> that's the case and there is only one NettyWorker thread per worker then it 
> should be fine.
>  
> I have implemented both methods and attempted to benchmark them with 
> [https://github.com/yahoo/storm-perf-test] but as i am running all workers on 
> one machine i couldn't get it to the point that the relevant code was ever 
> called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (STORM-3510) WorkerState.transferLocalBatch backpressure resend logic fix

Reply via email to