Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-07-01 Thread Kirill Smelkov
On Fri, Jul 01, 2016 at 01:25:00AM +0200, Juliusz Chroboczek wrote: > >> atomic updates in babel, > > > Patches gladly accepted. Or I'll do it at some point, but don't hold your > > breath > > Anyone working on that? Or are you all holding your breath? On my side I'm holding my breath with

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-30 Thread Juliusz Chroboczek
>> atomic updates in babel, > Patches gladly accepted. Or I'll do it at some point, but don't hold your > breath Anyone working on that? Or are you all holding your breath? -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Kirill Smelkov
On Thu, Jun 16, 2016 at 01:47:36PM -0700, Dave Taht wrote: > On Thu, Jun 16, 2016 at 1:40 PM, Kirill Smelkov wrote: > > On Thu, Jun 16, 2016 at 08:38:49AM -0700, Dave Taht wrote: > >> On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov wrote: > >> > On Wed, Jun 15,

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Dave Taht
On Thu, Jun 16, 2016 at 1:40 PM, Kirill Smelkov wrote: > On Thu, Jun 16, 2016 at 08:38:49AM -0700, Dave Taht wrote: >> On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov wrote: >> > On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote: >> >> >> If I

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Kirill Smelkov
On Thu, Jun 16, 2016 at 08:38:49AM -0700, Dave Taht wrote: > On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov wrote: > > On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote: > >> >> If I read you correctly, this looks like a kernel bug: incorrect > >> >>

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Dave Taht
On Thu, Jun 16, 2016 at 4:17 AM, Kirill Smelkov wrote: > On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote: >> >> If I read you correctly, this looks like a kernel bug: incorrect >> >> invalidation of the route cache. >> >> [...] >> >> > What we have here is of

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Kirill Smelkov
On Wed, Jun 15, 2016 at 12:56:34PM +0200, Juliusz Chroboczek wrote: > >> If I read you correctly, this looks like a kernel bug: incorrect > >> invalidation of the route cache. > > [...] > > > What we have here is of another kind - it is inherent race condition > > inside kernel > > Perhaps I'm

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-16 Thread Juliusz Chroboczek
> I have been running this script on four different machines for hours > now without reproducing your bug on the 4.4 or later kernels. It does > trigger on a 3.14 kernel. (it helps to do a killall fping6 before > exiting!) > It does not seem to be happening on 4.4 or later. Excellent news.

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-15 Thread Dave Taht
> https://lab.nexedi.com/kirr/iproute2/blob/bd480e66/t/rtcache-torture > (also attached to this email) > > which reproduces the problem in several minutes just on one computer and > retested it locally: I can reliably reproduce the issue on pristine > Debian 3.16.7-ckt25-2 (on both Atom

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-15 Thread Juliusz Chroboczek
>> If I read you correctly, this looks like a kernel bug: incorrect >> invalidation of the route cache. [...] > What we have here is of another kind - it is inherent race condition > inside kernel Perhaps I'm confused, but it still looks like a kernel bug to me. Perhaps it would make sense to

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-12 Thread Kirill Smelkov
( +iv, Nicolas's address corrected ) Dear Juliusz, Dave, thanks for reply. First of all I'd like to say I'm new to routing & friends, but I'll try to provide feedback: On Fri, Jun 10, 2016 at 08:47:34PM +0200, Juliusz Chroboczek wrote: > Dear Kirill, > > Thank you very much for the detailed

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-11 Thread Dave Taht
On Fri, Jun 10, 2016 at 11:47 AM, Juliusz Chroboczek wrote: > Dear Kirill, > > Thank you very much for the detailed analysis. > > If I read you correctly, this looks like a kernel bug: incorrect > invalidation of the route cache. While we have seen some similar

Re: [Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-10 Thread Juliusz Chroboczek
Dear Kirill, Thank you very much for the detailed analysis. If I read you correctly, this looks like a kernel bug: incorrect invalidation of the route cache. While we have seen some similar bugs in earlier kernel versions, they were not triggered by something that simple -- you needed to do

[Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-10 Thread Kirill Smelkov
Hello Babel world. First of all, let me please introduce myself. My name is Kirill. I'm one of the guys behind lab.nexedi.com team. As Babel community probably already knows, Nexedi is doing overlay networks. The site structure is backend -- frontends; frontends are located around the world.

[Babel-users] [BUG] Route "deadlocks" under load due to non-atomic kernel route updates

2016-06-10 Thread Kirill Smelkov
( resending after subscribing to mailing list ) Hello Babel world. First of all, let me please introduce myself. My name is Kirill. I'm one of the guys behind lab.nexedi.com team. As Babel community probably already knows, Nexedi is doing overlay networks. The site structure is backend --