Re: FreeBSD10.3-RELEASE. Kernel panic.

Cassiano Peixoto Thu, 20 Oct 2016 12:06:48 -0700

Hi  Kubilay,

Yes, i think all issues are related with this one. I already attached many
info about these crashes there, but looks like nobody cares about it.


This PR is assigned to Ermal, but he's not working on it because his last
PR change was in January.

If you or someone else is interested to help to fix it, please take this
PR. I can provide any information that you need.

Thanks.

On Thu, Oct 20, 2016 at 2:22 AM, Kubilay Kocak <[email protected]> wrote:

> On 19/10/2016 3:23 AM, Cassiano Peixoto wrote:
> > Hi guys,
> >
> > I have some update about this issue. After my last email i had 3 crashes.
> > Two of them had the same message on kernel debug:
> >
> > (kgdb) list *0xffffffff8228c918
> > 0xffffffff8228c918 is in trim_map_seg_compare
> > (/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/
> trim_map.c:108).
> > 103    trim_map_seg_compare(const void *x1, const void *x2)
> > 104    {
> > 105        const trim_seg_t *s1 = x1;
> > 106        const trim_seg_t *s2 = x2;
> > 107
> > 108        if (s1->ts_start < s2->ts_start) {
> > 109            if (s1->ts_end > s2->ts_start)
> > 110                return (0);
> > 111            return (-1);
> > 112        }
> > Current language:  auto; currently minimal
> > (kgdb) bt
> > #0  doadump (textdump=<value optimized out>) at pcpu.h:221
> > #1  0xffffffff80ad8e69 in kern_reboot (howto=260) at
> > /usr/src/sys/kern/kern_shutdown.c:366
> > #2  0xffffffff80ad941b in vpanic (fmt=<value optimized out>, ap=<value
> > optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
> > #3  0xffffffff80ad9253 in panic (fmt=0x0) at
> > /usr/src/sys/kern/kern_shutdown.c:690
> > #4  0xffffffff80fa0d31 in trap_fatal (frame=0xfffffe02374957f0,
> > eva=4294967343) at /usr/src/sys/amd64/amd64/trap.c:841
> > #5  0xffffffff80fa0f23 in trap_pfault (frame=0xfffffe02374957f0,
> > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:691
> > #6  0xffffffff80fa04cc in trap (frame=0xfffffe02374957f0) at
> > /usr/src/sys/amd64/amd64/trap.c:442
> > #7  0xffffffff80f84141 in calltrap () at
> > /usr/src/sys/amd64/amd64/exception.S:236
> > #8  0xffffffff8228c918 in trim_map_seg_compare (x1=0xfffffe0237495920,
> > x2=0x100000007) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:108
> > #9  0xffffffff821a98e1 in avl_find (tree=<value optimized out>,
> > value=<value optimized out>, where=0x0) at
> > /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:268
> > #10 0xffffffff8228ce9e in trim_map_write_start (zio=<value optimized
> out>)
> > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/
> trim_map.c:363
> > #11 0xffffffff822592df in zio_vdev_io_start (zio=0xfffff802191ea000) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2866
> > #12 0xffffffff82255b26 in zio_execute (zio=<value optimized out>) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1556
> > #13 0xffffffff822551e9 in zio_nowait (zio=0xfffff802191ea000) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1610
> > #14 0xffffffff8223c738 in vdev_queue_io_done (zio=<value optimized out>)
> at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:884
> > #15 0xffffffff822594a9 in zio_vdev_io_done (zio=0xfffff8006daad000) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2895
> > #16 0xffffffff82255b26 in zio_execute (zio=<value optimized out>) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1556
> > #17 0xffffffff80b363ca in taskqueue_run_locked (queue=<value optimized
> > out>) at /usr/src/sys/kern/subr_taskqueue.c:449
> > #18 0xffffffff80b372d8 in taskqueue_thread_loop (arg=<value optimized
> out>)
> > at /usr/src/sys/kern/subr_taskqueue.c:703
> > #19 0xffffffff80a90055 in fork_exit (callout=0xffffffff80b371f0
> > <taskqueue_thread_loop>, arg=0xfffff8001006b920,
> frame=0xfffffe0237495c00)
> > at /usr/src/sys/kern/kern_fork.c:1038
> > #20 0xffffffff80f8467e in fork_trampoline () at
> > /usr/src/sys/amd64/amd64/exception.S:611
> > #21 0x0000000000000000 in ?? ()
> > (kgdb) up 8
> > #8  0xffffffff8228c918 in trim_map_seg_compare (x1=0xfffffe0237495920,
> > x2=0x100000007) at
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:108
> > 108        if (s1->ts_start < s2->ts_start) {
> >
> > But my last crash had a different message:
> >
> > (kgdb) list *0xffffffff80b3a89c
> > 0xffffffff80b3a89c is in turnstile_broadcast
> > (/usr/src/sys/kern/subr_turnstile.c:837).
> > 832
> > 833        /*
> > 834         * Transfer the blocked list to the pending list.
> > 835         */
> > 836        mtx_lock_spin(&td_contested_lock);
> > 837        TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
> td_lockq);
> > 838        mtx_unlock_spin(&td_contested_lock);
> > 839
> > 840        /*
> > 841         * Give a turnstile to each thread.  The last thread gets
> > Current language:  auto; currently minimal
> > (kgdb) bt
> > #0  doadump (textdump=<value optimized out>) at pcpu.h:221
> > #1  0xffffffff80ad8e69 in kern_reboot (howto=260) at
> > /usr/src/sys/kern/kern_shutdown.c:366
> > #2  0xffffffff80ad941b in vpanic (fmt=<value optimized out>, ap=<value
> > optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
> > #3  0xffffffff80ad9253 in panic (fmt=0x0) at
> > /usr/src/sys/kern/kern_shutdown.c:690
> > #4  0xffffffff80fa0d31 in trap_fatal (frame=0xfffffe0237384870, eva=48)
> at
> > /usr/src/sys/amd64/amd64/trap.c:841
> > #5  0xffffffff80fa0f23 in trap_pfault (frame=0xfffffe0237384870,
> > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:691
> > #6  0xffffffff80fa04cc in trap (frame=0xfffffe0237384870) at
> > /usr/src/sys/amd64/amd64/trap.c:442
> > #7  0xffffffff80f84141 in calltrap () at
> > /usr/src/sys/amd64/amd64/exception.S:236
> > #8  0xffffffff80b3a89c in turnstile_broadcast (ts=0x0, queue=1) at
> > /usr/src/sys/kern/subr_turnstile.c:837
> > #9  0xffffffff80ad48cf in __rw_wunlock_hard (c=0xfffff8024f3c2960,
> > tid=<value optimized out>, file=<value optimized out>, line=<value
> > optimized out>)
> >     at /usr/src/sys/kern/kern_rwlock.c:1027
> > #10 0xffffffff80e1a75c in vm_map_delete (map=<value optimized out>,
> > start=<value optimized out>, end=<value optimized out>) at
> > /usr/src/sys/vm/vm_map.c:2960
> > #11 0xffffffff80e1828e in vmspace_exit (td=<value optimized out>) at
> > /usr/src/sys/vm/vm_map.c:3077
> > #12 0xffffffff80a88686 in exit1 (td=0xfffff80015533a00, rval=268849920,
> > signo=0) at /usr/src/sys/kern/kern_exit.c:398
> > #13 0xffffffff80a87e1d in sys_sys_exit (td=0x0, uap=<value optimized
> out>)
> > at /usr/src/sys/kern/kern_exit.c:178
> > #14 0xffffffff80fa168e in amd64_syscall (td=<value optimized out>,
> > traced=0) at subr_syscall.c:135
> > #15 0xffffffff80f8442b in Xfast_syscall () at
> > /usr/src/sys/amd64/amd64/exception.S:396
> > #16 0x0000000800b661aa in ?? ()
> > Previous frame inner to this frame (corrupt stack?)
> > (kgdb) up 8
> > #8  0xffffffff80b3a89c in turnstile_broadcast (ts=0x0, queue=1) at
> > /usr/src/sys/kern/subr_turnstile.c:837
> > 837        TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
> td_lockq);
> >
> > As you can see we are dealing with random crashes. I feel i'm not moving
> > forward here. it's not a hardware problem because i have 3 different
> > servers with same issue.
> >
> > Donald, did you have a chance to try 11-RELEASE? Any other behavior?
> >
> > Anyone have some idea that could help?
> >
> > Thanks.
> >
> >
> > On Thu, Oct 13, 2016 at 12:24 PM, Cassiano Peixoto <
> > [email protected]> wrote:
> >
> >> Hi guys,
> >>
> >> First of all, thanks to share your thoughts about this issue. I think
> it’s
> >> really important to find out a solution for this issue together.
> >>
> >> I can see two behaviors related, but for me the root cause is the same:
> >>
> >> 1- mpd5 process stuck with umtxn flag
> >> 2- system crash
> >>
> >> I’ve tested recently on FreeBSD 10.3 and FreeBSD-11-RC3. I’ve tried all
> >> suggested tunings with no success.
> >>
> >> My environment is:
> >> -  About 430 clients connected (but i can add more)
> >> - Using ZFS
> >> - igb NICs.
> >> - Generic kernel
> >>
> >> Two days ago i updated my system to FreeBSD 11-RELEASE-p1 and after this
> >> my system seems stable for almost 3 days. No crashes anymore. I need
> more
> >> days to feel confident if something has changed. But anyway, my crashes
> >> before happened every day.
> >>
> >> If it crashs again i’ll apply Donald recommendation and let you guys
> know.
> >>
> >> Let’s keep in touch, to try to at last fix it.
> >>
> >> Thanks.
> >>
> >> On Wed, Oct 12, 2016 at 8:24 PM, Donald Baud via freebsd-net <
> >> [email protected]> wrote:
> >>
> >>> On 10/12/16 3:24 PM, Zaphod Beeblebrox wrote:
> >>>
> >>> While my mp5 servers are possibly less busy (I havn't had common
> >>>> crashes), I have noticed a "group" of problems.
> >>>>
> >>>> 1. The carrier dropping communication (ie: fiber cut or l2 switch
> >>>> breakage) of the L2TP streams can leave mpd5 in a state where it will
> not
> >>>> die and will not destroy interfaces (requires reboot to clear).
> >>>>
> >>> I've encountered that once on 10.3 and I had tweaked some sysctl values
> >>> while monitoring :
> >>>> vmstat -z | head -1; vmstat -z | grep -i netgraph
> >>>
> >>> you might want to search other people's experience with the following
> >>> values:
> >>> # net.graph.maxdgram   #this is set in /etc/sysctl.conf
> >>> # net.graph.recvspace    #this is set in /etc/sysctl.conf
> >>> # net.graph.maxdata  #this is set in /boot/loader.conf
> >>> # net.graph.maxalloc #this is set in /boot/loader.conf
> >>>
> >>> I'll leave others to comment on what's best to set as values with their
> >>> experience on FreeBSD10.3.
> >>> In my case, as I had explained, one of the recipes that worked for me
> is
> >>> to comment out and leave those kernel values to their default.
> >>>
> >>> I've read in mpd5 mailing list some saying that FreeBSD-11 have had
> >>> upgrades on the netgraph modules.
> >>> I am now using FreeBSD-11 and It looks like I don't need any of the
> >>> kernel tweaks that I've described.
> >>>
> >>> Also, may I suggest you troubleshoot the fiber-cut or L2 switch
> breakage
> >>> by playing with some ipfw values to simulate a fiber-cut.:
> >>> ex: ipfw add 100 deny ip from 10.10.10.10 to me
> >>>
> >>>> 2. There are race conditions between quagga and mpd5 for
> adding/dropping
> >>>> routes.
> >>>>
> >>> While troubleshooting the crashes of the mpd5, I have removed
> net/quagga
> >>> and installed net/bird instead.
> >>> I am now using net/bird I've written a little howto to get you started
> >>> with net/bird
> >>> see: https://forums.freebsd.org/threads/56988/
> >>>
> >>> 3. if A is a pppoe client and B is the mpd5 server, A cannot access TCP
> >>>> services on B.  It can access tcp services _beyond_ B, but not on B.
> (there
> >>>> is a ticket open for this).
> >>>>
> >>>> On Wed, Oct 12, 2016 at 10:51 AM, Donald Baud via freebsd-net <
> >>>> [email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>
> >>>>     On 10/12/16 1:13 AM, Julian Elischer wrote:
> >>>>
> >>>>         On 11/10/2016 8:56 PM, Donald Baud via freebsd-net wrote:
> >>>>
> >>>>             I've been plagued with these =daily= panics until I tried
> >>>>             the following recipes and the server has been up for 30
> >>>>             days so far:
> >>>>
> >>>>             Normally I should expermient more to see which one of the
> >>>>             receipes is really the fix, but I'm just glad that the
> >>>>             server is stable for now.
> >>>>
> >>>>
> >>>>         this is really great information.
> >>>>         It makes debugging a lot more possible.
> >>>>         I know it is a hard question, but do you have a way to
> >>>>         simulate this workload?
> >>>>
> >>>>         I have no real way to simulate this kind of workload
> >>>>
> >>>>
> >>>>     Sadly, I don't have a way to simulate the workload but I am very
> >>>>     interested to help fix these crashes since as Cassiano said, this
> >>>>     makes mpd5/freebsd useless for pppoe/l2tp termination.
> >>>>
> >>>>     At this point, I would suggest that Cassiano and Андрей confirm
> >>>>     that they don't get panics when they apply the recipes that I am
> >>>>     using.
> >>>>
> >>>>     I am still running many other cisco-vpdn gateways that I would
> >>>>     convert into mpd5/freebsd but my plan was stalled with the daily
> >>>>     crashes.
> >>>>     I'll wait a couple of weeks to be sure that my recipes are a valid
> >>>>     workaround before converting my remaining cisco gateways to mpd5.
> >>>>
> >>>>     -Dbaud
> >>>>
> >>>>
> >>>>
> >>>>             recipe-1: Don't let mpd5 start automatically when server
> >>>>             boots:
> >>>>             i.e. in: /etc/rc.conf
> >>>>             mpd5_enable="NO"
> >>>>             and wait about 5 minutes after server boots then issue:
> >>>>             /usr/local/etc/rc.d/mpd5 onestart
> >>>>
> >>>>
> >>>>             recipe-2: recompile the kernel with the NETGRAPH_DEBUG
> >>>> option:
> >>>>             options         NETGRAPH
> >>>>             options         NETGRAPH_DEBUG
> >>>>             options         NETGRAPH_KSOCKET
> >>>>             options         NETGRAPH_L2TP
> >>>>             options         NETGRAPH_SOCKET
> >>>>             options         NETGRAPH_TEE
> >>>>             options         NETGRAPH_VJC
> >>>>             options         NETGRAPH_PPP
> >>>>             options         NETGRAPH_IFACE
> >>>>             options         NETGRAPH_MPPC_COMPRESSION
> >>>>             options         NETGRAPH_MPPC_ENCRYPTION
> >>>>             options         NETGRAPH_TCPMSS
> >>>>             options         IPFIREWALL
> >>>>
> >>>>             recipe-3: recompile the kernel and disable the IPv6 and
> >>>>             SCTP options:
> >>>>             nooptions       INET6
> >>>>             nooptions       SCTP
> >>>>
> >>>>             recipe-4: Don't use any of the sysctl optimizations
> >>>>             in other words I commented out all values in sysctl.conf:
> >>>>             # net.graph.maxdgram=20480  (this is the default)
> >>>>             # net.graph.recvspace=20480  (this is the default)
> >>>>
> >>>>             recipe-5: Don't use any of the loader.conf optimizations
> >>>>             in other words I commented out all values in loader.conf
> >>>>             # net.graph.maxdata=4096  (this is the default)
> >>>>             # net.graph.maxalloc=4096 (this is the default)
> >>>>
> >>>>             ================================
> >>>>             In my case, I had the panics with 10.3 and 11-PRERELEASE
> >>>>             11.0-PRERELEASE FreeBSD 11.0-PRERELEASE #2 r305587
> >>>>
> >>>>             With those recipes, I have been running without any crash
> >>>>             for a month and counting.  Thats' 300 l2tp tunnels and
> >>>>             1400 l2tp sessions generating 700Mbit/s.
> >>>>
> >>>>
> >>>>             -DBaud
> >>>>
> >>>>
> >>>>             On Tuesday, October 11, 2016 7:30 AM, Cassiano Peixoto
> >>>>             <[email protected]
> >>>>             <mailto:[email protected]>> wrote:
> >>>>             Hi,
> >>>>
> >>>>             There are many users complaining about this:
> >>>>
> >>>>             https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186114
> >>>>             <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186114
> >
> >>>>
> >>>>             I've been dealing with this issue for one year with no
> >>>>             solution. mpd5 as
> >>>>             pppoe server on FreeBSD is useless with this bug.
> >>>>
> >>>>             I really would like to see it working again, i think it's
> >>>>             quite important
> >>>>             to both project and many users.
> >>>>
> >>>>             Thanks.
> >>>>
> >>>>             On Tue, Oct 11, 2016 at 3:24 AM, Eugene Grosbein
> >>>>             <[email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>                 11.10.2016 11:02, Андрей Леушкин пишет:
> >>>>
> >>>>                     Hello. I have problem with "FreeBSD nas
> >>>>                     10.3-RELEASE FreeBSD 10.3-RELEASE
> >>>>                     #0: Fri Oct  7 21:12:56 YEKT 2016
> >>>>                     nas@nas:/usr/obj/usr/src/sys/nasv3
> >>>>                        amd64"
> >>>>
> >>>>                     Kernel panic is repeated at intervals of 2-3 days.
> >>>>                     At first I thought that
> >>>>                     the problem is in the hardware, but the problem
> >>>>                     did not go away after
> >>>>                     replacing the server platform.
> >>>>
> >>>>                     Coredumps and more info on link
> >>>>                     https://drive.google.com/open?
> >>>> id=0BxciMy2q7ZjTTkIxem9wTE1tM2M
> >>>>                     <https://drive.google.com/open
> >>>> ?id=0BxciMy2q7ZjTTkIxem9wTE1tM2M>
> >>>>
> >>>>                     Sorry for my english.
> >>>>                     I'll wait for an answer.
> >>>>
> >>>>                 This is known and long-stanging problem in the FreeBSD
> >>>>                 network stack.
> >>>>                 It shows up when you have lots of network interfaced
> >>>>                 created/removed
> >>>>                 frequently
> >>>>                 like in your case of Network Access Server (PPtP,
> >>>>                 PPPoE etc).
> >>>>
> >>>>                 Generally, people run into this problem using mpd5
> >>>>                 network daemon.
> >>>>                 mpd5 uses NETGRAPH kernel subsystem to process traffic
> >>>> and
> >>>>                 if an interface disappears (f.e., ,user disconnected)
> >>>>                 while kernel still processes traffic obtained from
> >>>>                 this interface, it
> >>>>                 panices.
> >>>>
> >>>>                 There were lots of reports of this problem. Noone
> >>>>                 seems to be working on
> >>>>                 it at the moment.
> >>>>                 You should fill a PR using Bugzilla and attach your
> >>>>                 logs to it.
> >>>>
> >>>>                 Eugene Grosbein
> >>>>
> >>>>
> >>> _______________________________________________
> >>> [email protected] mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>> To unsubscribe, send any mail to "[email protected]"
> >>>
> >>
> >>
> > _______________________________________________
> > [email protected] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "[email protected]"
> >
>
> For anyone experiencing these mpd hangs/crashes, if you believe your
> issue is the same as that described in Issue 186114 [1], please add your
> comments there including full system version information and crash
> backtraces (*as attachments*) if experiencing panics.
>
> Resolution of this problem is contingent on a clear test/reproduction
> cases (ideally as reduced as possible).
>
> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186114
>
> ./koobs
>
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[email protected]"

Re: FreeBSD10.3-RELEASE. Kernel panic.

Reply via email to