[ 
https://issues.apache.org/jira/browse/STORM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318317#comment-14318317
 ] 

ASF GitHub Bot commented on STORM-329:
--------------------------------------

Github user clockfly commented on the pull request:

    https://github.com/apache/storm/pull/429#issuecomment-74084548
  
    +1
    
    On Thu, Feb 12, 2015 at 6:44 PM, Michael G. Noll <[email protected]>
    wrote:
    
    > Thanks for your feedback, Nathan.
    >
    > As far as I understand this patch does not enable backpressure. But:
    > because there is no backpressure (yet) that we can rely on, this patch 
will
    > improve at least the situation during the startup phase of a topology to
    > prevent that a) an unacked topo will not lose messages during the startup,
    > and b) we do not need to unnecessarily replay messages in case of acked
    > topos during their startup. This is achieved by checking that all worker
    > connections are ready before the topology starts processing data.
    >
    > So backpressure is still an open feature. Backpressure was IIRC mentioned
    > in the initial PR because there was a deficiency (dating back to a ZMQ
    > related TODO) that caused problems related to this PR/Storm tickets (327,
    > 404, and one more). However, this patch does make the best of the current
    > situation even in the absence of backpressure. But first and foremost this
    > patch fixes a (critical) cascading failure that can bring Storm clusters 
to
    > a halt.
    >
    > Please correct me if I'm mistaken in my summary.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/storm/pull/429#issuecomment-74050954>.
    >



> Add Option to Config Message handling strategy when connection timeout
> ----------------------------------------------------------------------
>
>                 Key: STORM-329
>                 URL: https://issues.apache.org/jira/browse/STORM-329
>             Project: Apache Storm
>          Issue Type: Improvement
>    Affects Versions: 0.9.2-incubating
>            Reporter: Sean Zhong
>            Priority: Minor
>              Labels: Netty
>         Attachments: storm-329.patch, worker-kill-recover3.jpg
>
>
> This is to address a [concern brought 
> up|https://github.com/apache/incubator-storm/pull/103#issuecomment-43632986] 
> during the work at STORM-297:
> {quote}
> [~revans2] wrote: Your logic makes since to me on why these calls are 
> blocking. My biggest concern around the blocking is in the case of a worker 
> crashing. If a single worker crashes this can block the entire topology from 
> executing until that worker comes back up. In some cases I can see that being 
> something that you would want. In other cases I can see speed being the 
> primary concern and some users would like to get partial data fast, rather 
> then accurate data later.
> Could we make it configurable on a follow up JIRA where we can have a max 
> limit to the buffering that is allowed, before we block, or throw data away 
> (which is what zeromq does)?
> {quote}
> If some worker crash suddenly, how to handle the message which was supposed 
> to be delivered to the worker?
> 1. Should we buffer all message infinitely?
> 2. Should we block the message sending until the connection is resumed?
> 3. Should we config a buffer limit, try to buffer the message first, if the 
> limit is met, then block?
> 4. Should we neither block, nor buffer too much, but choose to drop the 
> messages, and use the built-in storm failover mechanism? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to