I have provided a patch that will hopefully allow your production to run
more robustly and also allow collection of debug info that may help explain
the nature of the problem.

On Mon, Jul 19, 2021 at 12:02 AM Wiggelinkhuizen J (Jaap) <
jaap.wiggelinkhui...@intraffic.nl> wrote:

> Dear Cliff,
>
>
>
> Thank you for your reaction. I have created PROTON-2411
> <https://issues.apache.org/jira/browse/PROTON-2411> with the information
> from my mail plus some additional information. Unfortunately we can’t
> reproduce the issue at our test facilities until now and without a clue of
> the cause we don’t know how to trigger it either.
>
>
>
> Indeed we build our own Proton libraries from source. Hence your offer to
> create a patch that helps reducing the impact and gathers more information
> would be very much appreciated.
>
>
>
> P.S.: I’m off for holiday’s from tomorrow. Could you reply to all in CC
> when reacting to this mail?
>
>
>
> Thanks again!
>
>
>
> With kind regards,
>
>
>
> *Jaap Wiggelinkhuizen*
>
>
>
> *Van:* Cliff Jansen <cliffjan...@gmail.com>
> *Verzonden:* vrijdag 16 juli 2021 18:23
> *Aan:* Wiggelinkhuizen J (Jaap) <jaap.wiggelinkhui...@intraffic.nl>;
> users@qpid.apache.org
> *Onderwerp:* Re: local-idle-timeout and idle timeout sequencing errors on
> several instances
>
>
>
> This is not a known bug. Despite your providing a helpful detailed
> account, I am unable to see the possibility of a second “earlier” deadline
> in the life of an AMQP connection.  Even being off by one.
>
>
>
> Please raise a JIRA including any additional information you can think of.
>
>
>
> Obviously a reproducer would be ideal, but may be hard to provide.
>
>
>
> Are you building your own Proton libraries from source? If so I could try
> to put together a patch that would be more resilient in the abort case and
> gather some additional bread crumbs to help analyze the circumstances of
> the failure.
>
>
>
> Cliff
>
>
>
>
>
>
>
> On Thu, Jul 15, 2021 at 3:31 AM Wiggelinkhuizen J (Jaap) <
> jaap.wiggelinkhui...@intraffic.nl> wrote:
>
> Dear Qpid users,
>
>
>
> In our mission critical software for the Dutch government we use Qpid
> proton 0.34.0 in our C++-client software together with the Qpid dispatch
> router 1.16.0. We updated to these versions not so long ago, before we used
> proton 0.25.0 and dispatch 1.3.0. Our application runs on several VM’s with
> a router on each VM. All clients connect to the local router only and the
> routers connect to eachother in a hub spoke pattern. In both the client
> configuration as the router configuration we have configured an idle
> timeout of 30 seconds.
>
>
>
> Two weeks ago we were confronted with an incident in production where a
> lot of our client processes reported problems regarding the idle timeouts.
> These client processes were already running stable for more than 3 weeks.
> The problem appeared in two flavors:
>
>    1. Transport error “error: amqp:resource-limit-exceeded:
>    local-idle-timeout expired”
>    2. epoll proactor failure in epoll_timer.c:263: “idle timeout
>    sequencing error”
>
> On each VM at least 3 processes showed one of these problems in a time
> window of less than a minute. We haven’t found any cause in the underlying
> hardware, hypervisor, network or operating system until now.
>
>
>
> Although we don’t know the root cause of the problems, we can solve the
> first situation by using the proper reconnect settings. However the second
> situation is more odd because it results in an abort within proton itself.
> The comments in epoll_timer.c explain that this error occurs when a
> connection timer is moved backwards a second time. We don’t understand how
> this can happen suddenly.
>
>
>
> Does anyone have experienced similar problems using recent proton versions
> (the epoll_timer.c module is introduced in version 0.33.0). And even more
> important is there a solution or workaround?
>
>
>
> Looking forward to any reaction. Thanks in advance!
>
>
>
> With kind regards,
>
>
>
> *Jaap Wiggelinkhuizen*
>
> Software architect & Systeem integrator
>
>
>
>
>
>
>
> *E*    *jaap.wiggelinkhui...@intraffic.nl
> <jaap.wiggelinkhui...@intraffic.nl>*
>
> *W*   intraffic.nl <https://www.intraffic.nl/>
>
>
>
>    <https://www.linkedin.com/company/intraffic>
>
>
>
> *Visiting address: Iepenhoeve 11, 3438 MR Nieuwegein
> <https://www.google.com/maps/search/Iepenhoeve+11,+3438+MR+Nieuwegein?entry=gmail&source=g>*
>
>
>
>
>
>

Reply via email to