Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-27 Thread Mark Millard
On 2017-Mar-21, at 7:21 PM, Mark Millard  wrote:

> On 2017-Mar-18, at 9:10 PM, Mark Millard  wrote:
> 
>> 
>> On 2017-Mar-18, at 5:53 PM, Mark Millard  wrote:
>> 
>>> A new, significant discovery follows. . .
>>> 
>>> While checking out use of procstat -v I ran
>>> into the following common property for the 3
>>> programs that I looked at:
>>> 
>>> A) My small test program that fails for
>>> a dynamically allocated space.
>>> 
>>> B) sh reporting Failed assertion: "tsd_booted".
>>> 
>>> C) su reporting Failed assertion: "tsd_booted".
>>> 
>>> Here are example addresses from the area of
>>> incorrectly zeroed memory (A then B then C):
>>> 
>>> (lldb) print dyn_region
>>> (region *volatile) $0 = 0x40616000
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>> 
>> That last above was a copy/paste error. Correction:
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x4061d520
>> 
>>> The first is from dynamic allocation ending up
>>> in the area. The other two are from libc.so.7
>>> globals/statics ending up in the general area.
>>> 
>>> It looks like something is trashing a specific
>>> memory area for some reason, rather independently
>>> of what the program specifics are.
> 
> I probably should have noted that the processes
> involved were: child/parent then grandparent
> and then great grandparent. The grandparent
> was sh and the great grandparent was su.
> 
> The ancestors in the process tree are being
> damaged, not just the instances of the
> program that demonstrates the problem.
> 
>>> Other notes:
>>> 
>>> At least for my small program showing failure:
>>> 
>>> Being explicit about the combined conditions for failure
>>> for my test program. . .
>>> 
>>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>>> are required in order to make the program fail.
>>> 
>>> Note:
>>> 
>>> lldb) print __je_tcache_maxclass
>>> (size_t) $0 = 32768
>>> 
>>> which is larger than SMALL_MAXCLASS. I've not observed
>>> failures for sizes above SMALL_MAXCLASS but not exceeding
>>> __je_tcache_maxclass.
>>> 
>>> Thus tcache use by itself does not seen sufficient for
>>> my program to get corruption of its dynamically allocated
>>> memory: the small allocation size also matters.
>>> 
>>> 
>>> Be warned that I can not eliminate the possibility that
>>> the trashing changed what region of memory it trashed
>>> for larger allocations or when tcache is disabled.
>> 
>> The pine64+ 2GB eventually got into a state where:
>> 
>> /etc/malloc.conf -> tcache:false
>> 
>> made no difference and the failure kept occurring
>> with that symbolic link in place.
>> 
>> But after a reboot of the pin46+ 2GB
>> /etc/malloc.conf -> tcache:false was again effective
>> for my test program. (It was still present from
>> before the reboot.)
>> 
>> I checked the .core files and the allocated address
>> assigned to dyn_region was the same in the tries
>> before and after the reboot. (I had put in an
>> additional raise(SIGABRT) so I'd always have
>> a core file to look at.)
>> 
>> Apparently /etc/malloc.conf -> tcache:false was
>> being ignored before the reboot for some reason?
> 
> I have also discovered that if the child process
> in an example like my program does a:
> 
> (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);
> 
> after the fork but before the sleep/swap-out/wait
> then the problem does not happen. This is without
> any read or write access to the memory between the
> fork and sleep/swap-out/wait.
> 
> By contrast such POSIX_MADV_WILLNEED use in the parent
> process does not change the failure behavior.

I've added another test program to bugzilla
217239 and 217138, one with thousands of 14
KiByte allocations.

The test program usually ends up with them all being
zeroed in the parent and child of the fork.

But I've had a couple of runs where a much smaller
prefix was messed up and then there were normal,
expected values.

#define region_size (14u*1024u)
. . .
#define num_regions (256u*1024u*1024u/region_size)

So num_regions==18724, using up most of 256 MiBytes.

Note: each region has its own 14 KiByte allocation.

But dyn_regions[1296].array[0] in one example was
the first normal value.

In another example dyn_regions[2180].array[4096] was
the first normal value.

The last is interesting for being part way through
an allocation's space. That but aligning with a 4
KiByte page size would seem odd for a pure-jemalloc
issue.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837)

2017-03-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

--- Comment #23 from Ben Woods  ---
I have also been running a week with this patch, with no more crashes. It
appears to have solved it - thank you!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic in nvidia module

2017-03-27 Thread David Marec
On Mon, Mar 27, 2017 at 11:39:28AM -0700, Larry Rosenman wrote:
> On 3/27/17, 11:34 AM, "Jonathan Chen"  behalf of j...@chen.org.nz> wrote:

Thanks Larry & Jonathan, rebuidling the nvidia driver against the new kernel 
solved the issue.


-- 
David Marec
https://lapinbilly.eu
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic in nvidia module

2017-03-27 Thread Larry Rosenman
On 3/27/17, 11:34 AM, "Jonathan Chen"  wrote:

On 28 March 2017 at 07:12, David Marec  wrote:
> greeting,
>
> Tracking 11-Stable,
> - now : 316014, -
> kernel panics on  'page fault' within nvidia module.
> The system boots and  works well ( 'vt' in graphic mode) until xorg 
starts.

Everytime you sync up STABLE, you have to rebuild your nvidia-driver
port, as it produces kernel modules that are tied closely the current
kernel. I also had a panic when I moved up my STABLE-11/amd
installation yesterday, but after a de-install and rebuild of the
nvidia-driver port, I'm up and running again.

Cheers.
-- 
Jonathan Chen 

I fixed the issue by adding:
PORTS_MODULES+=x11/nvidia-driver
To my /etc/make.conf so it gets rebuilt on every kernel build.

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
 
 




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic in nvidia module

2017-03-27 Thread Jonathan Chen
On 28 March 2017 at 07:12, David Marec  wrote:
> greeting,
>
> Tracking 11-Stable,
> - now : 316014, -
> kernel panics on  'page fault' within nvidia module.
> The system boots and  works well ( 'vt' in graphic mode) until xorg starts.

Everytime you sync up STABLE, you have to rebuild your nvidia-driver
port, as it produces kernel modules that are tied closely the current
kernel. I also had a panic when I moved up my STABLE-11/amd
installation yesterday, but after a de-install and rebuild of the
nvidia-driver port, I'm up and running again.

Cheers.
-- 
Jonathan Chen 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Panic in nvidia module

2017-03-27 Thread David Marec

greeting,

Tracking 11-Stable,
- now : 316014, -
kernel panics on  'page fault' within nvidia module.
The system boots and  works well ( 'vt' in graphic mode) until xorg starts.
-

I had to rollback to a r315900 kernel to make xorg run again.

Here is a dump:


=== kgdb kernel.debug /var/crash/vmcore.last ==
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x4
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x82c76964
stack pointer   = 0x28:0xfe0235b8e4b0
frame pointer   = 0x28:0xfe0235b8e4b0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 3
current process = 876 (Xorg)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0x80a8c397 at kdb_backtrace+0x67
#1 0x80a496c6 at vpanic+0x186
#2 0x80a49533 at panic+0x43
#3 0x80eadf52 at trap_fatal+0x322
#4 0x80eae11c at trap_pfault+0x1bc
#5 0x80ead7d0 at trap+0x280
#6 0x80e92681 at calltrap+0x8
#7 0x82c434ef at _nv017563rm+0x1f
Uptime: 1h3m54s
Dumping 489 out of 8082 MB:..4%..14%..23%..33%..43%..53%..63%..72%..82%..92%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.

done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.

done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linux.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/linux.ko.debug...done.

done.
Loaded symbols for /boot/kernel/linux.ko
Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/linux_common.ko.debug...done.

done.
Loaded symbols for /boot/kernel/linux_common.ko
Reading symbols from /boot/kernel/linux64.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/linux64.ko.debug...done.

done.
Loaded symbols for /boot/kernel/linux64.ko
Reading symbols from /boot/modules/nvidia-modeset.ko...done.
Loaded symbols for /boot/modules/nvidia-modeset.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
#0  doadump (textdump=) at pcpu.h:222
222 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) backtrace
#0  doadump (textdump=) at pcpu.h:222
#1  0x80a49256 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:366
#2  0x80a49700 in vpanic (fmt=, ap=optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0x80a49533 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:690
#4  0x80eadf52 in trap_fatal (frame=0xfe0235b8e3f0, eva=4) 
at /usr/src/sys/amd64/amd64/trap.c:801
#5  0x80eae11c in trap_pfault (frame=0xfe0235b8e3f0, 
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:658
#6  0x80ead7d0 in trap (frame=0xfe0235b8e3f0) at 
/usr/src/sys/amd64/amd64/trap.c:421
#7  0x80e92681 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236

#8  0x82c76964 in os_get_euid () from /boot/modules/nvidia.ko
#9  0x82c434ef in _nv017563rm () from /boot/modules/nvidia.ko
#10 0xf80021881400 in ?? ()
#11 0x82ba3330 in _nv004904rm () from /boot/modules/nvidia.ko
#12 0x in ?? ()
Current language:  auto; currently minimal

=== kgdb kernel.debug /var/crash/vmcore.last ==


Thanks


--
David Marec
https://lapinbilly.eu/
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 213903] Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837)

2017-03-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213903

--- Comment #22 from Cassiano Peixoto  ---
(In reply to Franco Fichtner from comment #21)
Hi Franco, i agree with you. Now it's 6 days with no crashing anymore. Mateusz,
can you take a look please?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Open positions at CMP Group

2017-03-27 Thread HR Department
http://www.teamcmp.com


** CMP Group is hiring!


We're a digital entertainment business (leader in the industry, with over 10M 
unique requests/day) with headquarters in Barcelona and an office in New York.

We are hiring talents across the business to help us with creative product 
solutions and to design new applications from scratch, using cutting edge 
technologies.


Our Referral Program :
Get your friend a new job and we will give you 500€! *

We have lots of open positions, here are a few examples:
* PHP Developer (https://cmp-group.workable.com/jobs/431564)  - Webpack, Web 
Push Notifications
* QA Engineer (https://cmp-group.workable.com/jobs/438991)  - Selenium, 
Jenkins, Java
* DevOps Engineer (https://cmp-group.workable.com/jobs/440239)  -  Terraform, 
Go, Elastic Search

Check out all our open positions! (http://teamcmp.com/jobs/)
We're CMP Group (http://www.teamcmp.com) . Want to know more about our 
technical philosophies? Check out the tech team's manifesto 
(http://teamcmp.com/manifesto/) .
*Terms & Conditions
1. Candidates must be placed and employed within 15 weeks of a referral being 
submitted
2. The payment will be done through a Paypal transfer
3. CMP group reserves the right to amend or withdraw the referral scheme 
without further notice
4. To receive credit, your friend must enter your email address when they apply 
for the role  so we know who should be paid for the referral


** (https://www.linkedin.com/company-beta/3008891/)
** (http://www.teamcmp.com)
Copyright © 2017 CMP Group, All rights reserved.
** www.teamcmp.com (http://www.teamcmp.com)

Want to change how you receive these emails?
You can
** update your preferences 
(http://teamcmp.us15.list-manage.com/profile?u=5ca1791377601525c05daeb69=fd13787516=0d42462541)
or ** unsubscribe from this list 
(http://teamcmp.us15.list-manage.com/unsubscribe?u=5ca1791377601525c05daeb69=fd13787516=0d42462541=54b28644fc)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Opteron 6100-series "Magny-Cours"

2017-03-27 Thread Pete French



On 03/27/17 11:09, Andrea Venturoli wrote:

On 03/25/17 19:02, Andriy Gapon wrote:


Does anyone [still] use Opteron 6100-series / "Magny-Cours" processors 
with FreeBSD?


Will an equivalent Athlon do or is this Opteron specific?

What would that Athlon be?


Opteron 6100 was a K10 core, like the Phenom II, but they were 8 core, 
so I would hazzard a guess that the FX-8350 or similar is the same. But 
its usually a good idea to test on the exact CPU exhibiting the issue.


-pete. (FWIW, this is being types on a K10 machine and it works fine.)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Opteron 6100-series "Magny-Cours"

2017-03-27 Thread Andrea Venturoli

On 03/25/17 19:02, Andriy Gapon wrote:


Does anyone [still] use Opteron 6100-series / "Magny-Cours" processors with 
FreeBSD?


Will an equivalent Athlon do or is this Opteron specific?

What would that Athlon be?

 bye
av.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Opteron 6100-series "Magny-Cours"

2017-03-27 Thread Andriy Gapon
On 03/25/2017 23:26, Jack L. wrote:
> I have a few still sitting in a corner with FreeBSD 7 or 8 on them. Someday i 
> might put them back on with FreeBSD but not anytime soon

Apologies for not qualifying my question.
I would like to obtain some information from such a system and possibly to ask
to test a patch.
Looks like you won't be able to help with that.  At least, until that some day 
:-).

>> On Mar 25, 2017, at 11:02 AM, Andriy Gapon  wrote:
>>
>>
>> Does anyone [still] use Opteron 6100-series / "Magny-Cours" processors with 
>> FreeBSD?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"