Re: stable/13, vm page counts do not add up

2021-04-14 Thread Mark Johnston
On Wed, Apr 14, 2021 at 02:21:44PM +0300, Andriy Gapon wrote:
> On 14/04/2021 00:18, Mark Johnston wrote:
> > fbt::vm_page_unwire:entry
> > /args[0]->oflags & 0x4/
> > {
> > @unwire[stack()] = count();
> > }
> 
> Unrelated report, dtrace complains about this probe on my stable/13 system:
>  failed to resolve translated type for args[0]
> 
> And I do not have any idea why...

There was a regression, see PR 253440.  I think you have the fix
already, but perhaps not.  Could you show output from
"dtrace -lv -n fbt::vm_page_unwire:entry"?

> 
>  From ctfdump:
>[27290] FUNC (vm_page_unwire) returns: 38 args: (1463, 3)
> 
><1463> TYPEDEF vm_page_t refers to 778
><778> POINTER (anon) refers to 3575
><3575> STRUCT vm_page (104 bytes)
>  plinks type=3563 off=0
>  listq type=3558 off=128
>  object type=3564 off=256
>  pindex type=3565 off=320
>  phys_addr type=42 off=384
>  md type=3571 off=448
>  ref_count type=31 off=640
>  busy_lock type=31 off=672
>  a type=3573 off=704
>  order type=3 off=736
>  pool type=3 off=744
>  flags type=3 off=752
>  oflags type=3 off=760
>  psind type=2167 off=768
>  segind type=2167 off=776
>  valid type=3574 off=784
>  dirty type=3574 off=792
> 
> -- 
> Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/13, vm page counts do not add up

2021-04-13 Thread Mark Johnston
On Tue, Apr 13, 2021 at 05:01:49PM +0300, Andriy Gapon wrote:
> On 07/04/2021 23:56, Mark Johnston wrote:
> > I don't know what might be causing it then.  It could be a page leak.
> > The kernel allocates wired pages without adjusting the v_wire_count
> > counter in some cases, but the ones I know about happen at boot and
> > should not account for such a large disparity.  I do not see it on a few
> > systems that I have access to.
> 
> Mark or anyone,
> 
> do you have a suggestion on how to approach hunting for the potential page 
> leak?
> It's been a long while since I worked with that code and it changed a lot.
> 
> Here is some additional info.
> I had approximately 2 million unaccounted pages.
> I rebooted the system and that number became 20 thousand which is more
> reasonable and could be explained by those boot-time allocations that you 
> mentioned.
> After 30 hours of uptime the number became 60 thousand.
> 
> I monitored the number and so far I could not correlate it with any activity.
> 
> P.S.
> I have not been running any virtual machines.
> I do use nvidia graphics driver.

My guess is that something is allocating pages without VM_ALLOC_WIRE and
either they're managed and something is failing to place them in page
queues, or they're unmanaged and should likely be counted as wired.

It is also possible that something is allocating wired, unmanaged
pages and unwiring them without freeing them.  For managed pages,
vm_page_unwire() ensures they get placed in a queue.
vm_page_unwire_noq() does not, but it is typically only used with
unmanaged pages. 

The nvidia drivers do not appear to call any vm_page_* functions, at
least based on the kld symbol tables.

So you might try using DTrace to collect stacks for these functions,
leaving it running for a while and comparing stack counts with the
number of pages leaked while the script is running.  Something like:

fbt::vm_page_alloc_domain_after:entry
/(args[3] & 0x20) == 0/
{
@alloc[stack()] = count();
}

fbt::vm_page_alloc_contig_domain:entry
/(args[3] & 0x20) == 0/
{
@alloc[stack()] = count();
}

fbt::vm_page_unwire_noq:entry
{
@unwire[stack()] = count();
}

fbt::vm_page_unwire:entry
/args[0]->oflags & 0x4/
{
@unwire[stack()] = count();
}

It might be that the count of leaked pages does not relate directly to
the counts collected by the script, e.g., because there is some race
that results in a leak.  But we can try to rule out some easier cases
first.

I tried to look for possible causes of the KTLS page leak mentioned
elsewhere in this thread but can't see any obvious problems.  Does your
affected system use sendfile() at all?  I also wonder if you see much
mbuf usage on the system.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/13, vm page counts do not add up

2021-04-07 Thread Mark Johnston
On Wed, Apr 07, 2021 at 11:22:41PM +0300, Andriy Gapon wrote:
> On 07/04/2021 22:54, Mark Johnston wrote:
> > On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote:
> >>
> >> I regularly see that the top's memory line does not add up (and by a lot).
> >> That can be seen with vm.stats as well.
> >>
> >> For example:
> >> $ sysctl vm.stats | fgrep count
> >> vm.stats.vm.v_cache_count: 0
> >> vm.stats.vm.v_user_wire_count: 3231
> >> vm.stats.vm.v_laundry_count: 262058
> >> vm.stats.vm.v_inactive_count: 3054178
> >> vm.stats.vm.v_active_count: 621131
> >> vm.stats.vm.v_wire_count: 1871176
> >> vm.stats.vm.v_free_count: 18
> >> vm.stats.vm.v_page_count: 8134982
> >>
> >> $ bc
> >>>>> 18 + 1871176 + 621131 + 3054178 + 262058
> >> 5996320
> >>>>> 8134982 - 5996320
> >> 2138662
> >>
> >> As you can see, it's not a small number of pages either.
> >> Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on 
> >> this
> >> system.
> >>
> >> This is 47c00a9835926e96, 13.0-STABLE amd64.
> >> I do not think that I saw anything like that when I used (much) older 
> >> FreeBSD.
> > 
> > One relevant change is that vm_page_wire() no longer removes pages from
> > LRU queues, so the count of pages in the queues can include wired pages.
> > If the page daemon runs, it will dequeue any wired pages that are
> > encountered.
> 
> Maybe I misunderstand how that works, but I would expect that the sum of all
> counters could be greater than v_page_count at times.  But in my case it's 
> less.

I misread, sorry.  You're right, what I described would cause double
counting.

I don't know what might be causing it then.  It could be a page leak.
The kernel allocates wired pages without adjusting the v_wire_count
counter in some cases, but the ones I know about happen at boot and
should not account for such a large disparity.  I do not see it on a few
systems that I have access to.

> > This was done to reduce queue lock contention, operations like
> > sendfile() which transiently wire pages would otherwise trigger two
> > queue operations per page.  Now that queue operations are batched this
> > might not be as important.
> > 
> > We could perhaps add a new flavour of vm_page_wire() which is not lazy
> > and would be suited for e.g., the buffer cache.  What is the primary
> > source of wired pages in this case?
> 
> It should be ZFS, I guess.
> 
> -- 
> Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/13, vm page counts do not add up

2021-04-07 Thread Mark Johnston
On Wed, Apr 07, 2021 at 10:42:57PM +0300, Andriy Gapon wrote:
> 
> I regularly see that the top's memory line does not add up (and by a lot).
> That can be seen with vm.stats as well.
> 
> For example:
> $ sysctl vm.stats | fgrep count
> vm.stats.vm.v_cache_count: 0
> vm.stats.vm.v_user_wire_count: 3231
> vm.stats.vm.v_laundry_count: 262058
> vm.stats.vm.v_inactive_count: 3054178
> vm.stats.vm.v_active_count: 621131
> vm.stats.vm.v_wire_count: 1871176
> vm.stats.vm.v_free_count: 18
> vm.stats.vm.v_page_count: 8134982
> 
> $ bc
> >>> 18 + 1871176 + 621131 + 3054178 + 262058
> 5996320
> >>> 8134982 - 5996320
> 2138662
> 
> As you can see, it's not a small number of pages either.
> Approximately 2 million pages, 8 gigabytes or 25% of the whole memory on this
> system.
> 
> This is 47c00a9835926e96, 13.0-STABLE amd64.
> I do not think that I saw anything like that when I used (much) older FreeBSD.

One relevant change is that vm_page_wire() no longer removes pages from
LRU queues, so the count of pages in the queues can include wired pages.
If the page daemon runs, it will dequeue any wired pages that are
encountered.

This was done to reduce queue lock contention, operations like
sendfile() which transiently wire pages would otherwise trigger two
queue operations per page.  Now that queue operations are batched this
might not be as important.

We could perhaps add a new flavour of vm_page_wire() which is not lazy
and would be suited for e.g., the buffer cache.  What is the primary
source of wired pages in this case?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: current make world brakes if HESIOD enabled

2021-04-05 Thread Mark Johnston
On Sat, Apr 03, 2021 at 09:18:29AM +0300, Daniel Braniss wrote:
> I must be the last person on earth to use Hesiod :-)
> this are the diffs:

Thanks, this was committed earlier today.

> diff --git a/lib/libc/gen/getgrent.c b/lib/libc/gen/getgrent.c
> index afb89cab3..5832cb8c6 100644
> --- a/lib/libc/gen/getgrent.c
> +++ b/lib/libc/gen/getgrent.c
> @@ -971,7 +971,7 @@ dns_group(void *retval, void *mdata, va_list ap)
>   hes = NULL;
>   name = NULL;
>   gid = (gid_t)-1;
> - how = (enum nss_lookup_type)mdata;
> + how = (enum nss_lookup_type)(uintptr_t)mdata;
>   switch (how) {
>   case nss_lt_name:
>   name = va_arg(ap, const char *);
> diff --git a/lib/libc/gen/getpwent.c b/lib/libc/gen/getpwent.c
> index a07ee109e..bc1d341fd 100644
> --- a/lib/libc/gen/getpwent.c
> +++ b/lib/libc/gen/getpwent.c
> @@ -1108,7 +1108,7 @@ dns_passwd(void *retval, void *mdata, va_list ap)
>   hes = NULL;
>   name = NULL;
>   uid = (uid_t)-1;
> - how = (enum nss_lookup_type)mdata;
> + how = (enum nss_lookup_type)(uintptr_t)mdata;
>   switch (how) {
>   case nss_lt_name:
>   name = va_arg(ap, const char *);
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 13.0-RC4 and Nginx process "stuck" during restart

2021-03-30 Thread Mark Johnston
On Mon, Mar 29, 2021 at 10:57:09PM +0300, Christos Chatzaras wrote:
> Hello,
> 
> 
> I upgrade from 12.2 to 13.0-RC4 and I notice a strange issue with Nginx.
> 
> When I run "service nginx restart" in some (random) servers it doesn't 
> complete the restart and it "stucks" at "Waiting for PIDS: 20536." .
> 
> I can kill the 20536 process and then restart completes.
> 
> 
> procstat -kk 20536:
> 
>   PIDTID COMMTDNAME  KSTACK
> 63094 100505 nginx   -   mi_switch+0xc1 
> sleepq_catch_signals+0x2e6 sleepq_wait_sig+0x9 _sleep+0x1be 
> kern_sigsuspend+0x164 sys_sigsuspend+0x31 amd64_syscall+0x10c 
> fast_syscall_common+0xf8 
> 
> 
> I found this commit:
> 
> https://cgit.freebsd.org/src/commit/?id=dbec10e08808e375365fb2a2462f306e0cdfda32
>  
> 
> 
> Could this be related? If yes can we have the patch in releng/13.0 ?

I think it is hard to say without some testing.  Are you able to verify
that backporting the patch fixes the hangs?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: dtrace issue on releng/13.0

2021-02-23 Thread Mark Johnston
On Tue, Feb 23, 2021 at 02:36:04PM -0600, Dean E. Weimer via freebsd-stable 
wrote:
> I just built and installed FreeBSD 13.0 Beta 3 from source checked out 
> with last commit of 4b737a9c58cac69008f189cc44e7d1a81a0b601c after the 
> install I was installing a few ports, and perl5.32 failed to build 
> stating dtrace -h not available.
> 
> root@fbsd13-devel:/ # dtrace -h
> dtrace: failed to establish error handler: "/usr/lib/dtrace/psinfo.d", 
> line 1: failed to copy type of 'pr_uid': Type information is in parent 
> and unavailable
> 
> The initial install from the FreeBSD-BETA3 iso image does work as 
> expected, so the problem was introduced since then.
> 
> root@fbsd13:/jails/devel # dtrace -h
> dtrace: -h requires one or more scripts or enabling options
> 
> It does appear that the commit
> ae093a0614f30d4cdffb853e4eba93322e8ed8f4 
> references changes to dtrace.

Are you using GENERIC, or some custom kernel configuration?  Could you
show output from "ctfdump -S /boot/kernel/kernel"?  The error you
reported is typically the result of some problems with the C type
metadata used by dtrace.  The commit you referenced ought to be
unrelated.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Microcode update prevents boot

2021-02-14 Thread Mark Johnston
On Sun, Feb 14, 2021 at 02:01:14PM +0100, Leon Dietrich wrote:
> Hi there,
> 
> I already worked around the issue myself. I'm just writing this here in
> case someone else may have the same issue and is seeking an answer.
> 
> 
> I recently upgraded the intel cpu microcode update package. Since then
> the boot process hang at the stage where the other cpu cores where
> enabled (shortly after enabling acpi). In order to resolve the issue one
> has to boot in safe mode (not single user mode!) and comment (or remove)
> the lines enabling the cpu microcode update on boot in
> /boot/loader.conf. One can and should reboot then.
> 
> After making these changes the system boots again and all cores are
> started and SMT works as well. One should note that one's not running
> the newer microcode (including some security-) fixes. Having
> microcode_update_enable="YES" in /etc/rc.conf doesn't prevent booting
> and does not cause noticeable instability.
> 
> For reference: Im running FreeBSD 12.1 on a supermicro embedded board
> with intel xeon E3-1585L v5 cpus.
> 
> 
> I hope someone will find this info useful.

I see that r347931 was not merged to stable/12 branch, but the lockless
delayed invalidation changes were indeed present in 12.1.  Could you see
if the hang persists when boot-time ucode loading is enabled and
vm.pmap.di_locked=1 is configured?  Note that you could apply both
configurations at the loader prompt, i.e., without having to edit
loader.conf and boot in safe mode to revert the change.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-08 Thread Mark Johnston
On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote:
> On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on 
> > > > all
> > > > processors.  I don't have strong opinions about whether we should commit
> > > > kib's patch too.  Kib, what do you think?
> > > 
> > > The patch causes some memory over-use.
> > > 
> > > If this issue is not too widely experienced, I prefer to not commit the 
> > > patch.
> > 
> > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> > anything?
> > 
> > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > index 03100e77d45..0619a41b128 100644
> > --- a/sys/x86/x86/mca.c
> > +++ b/sys/x86/x86/mca.c

> I think something should be printed in this case, at least once.
> I believe printf() already works, because spin locks do.

Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
Access to the static flag is synchronized by mca_lock.

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d45..8098bcfb4bd 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
 static void
 cmci_monitor(int i)
 {
+   static bool first = true;
struct cmc_state *cc;
uint64_t ctl;
 
KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+   /*
+* It is possible for some APs to report CMCI support even if the BSP
+* does not, apparently due to a BIOS bug.
+*/
+   if (cmc_state == NULL) {
+   if (first) {
+   printf(
+   "AP %d reports CMCI support but the BSP does not\n",
+   PCPU_GET(cpuid));
+   first = false;
+   }
+   return;
+   }
+
ctl = rdmsr(MSR_MC_CTL2(i));
if (ctl & MC_CTL2_CMCI_EN)
/* Already monitored by another CPU. */
@@ -1114,6 +1129,10 @@ cmci_resume(int i)
 
KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+   /* See cmci_monitor(). */
+   if (cmc_state == NULL)
+   return;
+
/* Ignore banks not monitored by this CPU. */
if (!(PCPU_GET(cmci_mask) & 1 << i))
return;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-08 Thread Mark Johnston
On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit on all
> > processors.  I don't have strong opinions about whether we should commit
> > kib's patch too.  Kib, what do you think?
> 
> The patch causes some memory over-use.
> 
> If this issue is not too widely experienced, I prefer to not commit the patch.

Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
anything?

diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d45..0619a41b128 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1070,6 +1070,13 @@ cmci_monitor(int i)
 
KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+   /*
+* It is possible for some APs to report CMCI support even if the BSP
+* does not, apparently due to a BIOS bug.
+*/
+   if (cmc_state == NULL)
+   return;
+
ctl = rdmsr(MSR_MC_CTL2(i));
if (ctl & MC_CTL2_CMCI_EN)
/* Already monitored by another CPU. */
@@ -1114,6 +1121,10 @@ cmci_resume(int i)
 
KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
 
+   /* See cmci_monitor(). */
+   if (cmc_state == NULL)
+   return;
+
/* Ignore banks not monitored by this CPU. */
if (!(PCPU_GET(cmci_mask) & 1 << i))
return;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Page fault in _mca_init during startup

2021-02-04 Thread Mark Johnston
On Fri, Feb 05, 2021 at 12:58:34AM +0200, Konstantin Belousov wrote:
> On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy wrote:
> > On Thu, Feb 4, 2021 at 1:31 PM Alan Somers  wrote:
> > >
> > > After upgrading a machine to FreeBSD, 12.2, it hit the following panic on
> > > its first reboot.  I suspect that a few other servers have hit this too,
> > > but since it happens before swap is mounted there are no core dumps, and
> > > they usually reboot immediately.  The code in question hasn't changed 
> > > since
> > > 2018.  The panic happened in cmci_monitor at line 930.  Does anybody have
> > > any suggestions for how I could debug further?  I can't readily reproduce
> > > it, and I can't dump core, but I'd like to investigate it any way I can.
> > > The server in question has dual Xeon Gold 6142 CPUs.
> > >
> Try this.
> 
> I think that there is no other dependencies in the startup order, but
> cannot know it for sure.
> 
> commit 19584e3d3e9606d591fa30999b370ed758960e8c
> Author: Konstantin Belousov 
> Date:   Fri Feb 5 00:56:09 2021 +0200
> 
> x86: init mca before APs are started

APs only call mca_init() after they have been released by the BSP
though, and that happens later in SI_SUB_SMP.

> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d455..e2bf2673cf69 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
>  
>   mca_init();
>  }
> -SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
> +SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
>  
>  /* Called when a machine check exception fires. */
>  void
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread Mark Johnston
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote:
> Hmmm, another one. Not sure if this is hardware as it seems different ?
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x0
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x80ca0826
> stack pointer   = 0x28:0xfe00bc0f8540
> frame pointer   = 0x28:0xfe00bc0f8590
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 33 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 11
> time = 1608641071
> KDB: stack backtrace:
> #0 0x80a3fe85 at kdb_backtrace+0x65
> #1 0x809f406b at vpanic+0x17b
> #2 0x809f3ee3 at panic+0x43
> #3 0x80e3fe71 at trap_fatal+0x391
> #4 0x80e3fecf at trap_pfault+0x4f
> #5 0x80e3f516 at trap+0x286
> #6 0x80e19318 at calltrap+0x8
> #7 0x80ca47d4 at bucket_cache_drain+0x134
> #8 0x80c9e302 at zone_drain_wait+0xa2
> #9 0x80ca2bbd at uma_reclaim_locked+0x6d
> #10 0x80ca2af4 at uma_reclaim+0x34
> #11 0x80cc5321 at vm_pageout_worker+0x421
> #12 0x80cc4ee3 at vm_pageout+0x193
> #13 0x809b55be at fork_exit+0x7e
> #14 0x80e1a34e at fork_trampoline+0xe
> Uptime: 5d20h37m16s
> Dumping 16057 out of 65398
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) bt

Could you go to frame 11 and print zone->uz_name and
bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
somehow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to free used Swap-Space?

2020-09-22 Thread Mark Johnston
On Tue, Sep 22, 2020 at 07:31:07PM +0200, Peter wrote:
> On Tue, Sep 22, 2020 at 12:33:19PM -0400, Mark Johnston wrote:
> 
> ! On Tue, Sep 22, 2020 at 06:08:01PM +0200, Peter wrote:
> 
> ! >  my machine should use about 3-4, maybe 5 GB swapspace. Today I found
> ! > it suddenly uses 8 GB (which is worryingly near the configured 10G).
> ! > 
> ! > I stopped all the big suckers - nothing found.
> ! > I stopped all the jails - no success.
> ! > I brought it down to singleuser: it tried to swapoff, but failed.
> ! > 
> ! > I unmounted all filesystems, exported all pools, detached all geli,
> ! > and removed most of the netgraphs. Swap is still occupied.
> ! > ! > Machine is now running only the init and a shell processes, has
> ! > almost no filesystems mounted, has mostly native networks only, and
> ! > this still occupies 3 GB of swap which cannot be released.
> ! > 
> ! > What is going on, what is doing this, and how can I get this swapspace
> ! > released??
> ! 
> ! Do you have any shared memory segments lingering?  ipcs -a will show
> ! SysV shared memory usage.
> 
> I have four small shmem segments from four postgres clusters running.
> These should cleanly disappear when the clusters are stopped, and
> they are very small.
> 
> Shared Memory:
> T   ID  KEY MODEOWNERGROUPCREATOR  CGROUP 
> NATTCHSEGSZ CPID LPID ATIMEDTIMECTIME   
> m65536  5432001 --rw--- postgres postgres postgres postgres   
>  7   48 4793 4793  6:09:34 18:00:31  6:09:34
> m655370 --rw--- postgres postgres postgres postgres   
> 11   48 6268 6268  6:09:42 10:48:27  6:09:42
> m655380 --rw--- postgres postgres postgres postgres   
>  5   48 6968 6968  6:09:46 18:28:36  6:09:46
> m655390 --rw--- postgres postgres postgres postgres   
>  6   48 6992 6992  6:09:47  3:38:34  6:09:47
> 
> ! For POSIX shared memory, in 11.4 we do not
> ! have any good way of listing objects, but "vmstat -m | grep shmfd" will
> ! at least show whether any are allocated.
> 
> There is something, and I don't know who owns that:
> $ vmstat -m | grep shmfd
> shmfd1314K   -  473  64,256,1024,8192
> 
> But that doesn't look big either.

That is just the amount of kernel memory used to track a set of objects,
not the actual object sizes.  Unfortunately, in 11 I don't think there's
any way to enumerate them other than running kgdb and examining the
shm_dictionary hash table.

> Furthermore, this machine is running for quite some time already; it
> was running as i386 (with ZFS) until very recently, and I know quite
> well what is using much memory: these 3 GB were illegitimate; they
> came from nothing I did install. And they are new; this has not
> happened before.
> 
> ! If those don't turn anything
> ! up then it's possible that there's a swap leak.  Do you use any DRM
> ! graphics drivers on this system?
> 
> Probably yes. There is no graphics used at all; it just uses "device
> vt" in text mode, but it uses i5-3570T CPU (IvyBridge HD2500) graphics
> for that, and the driver is "drm2" and "i915drm" from /usr/src/sys (not
> those from ports).
> Not sure how that would account for 3 GB, unless there is indeed some
> leak.

I think I see a possible problem in i915, though I'm not sure if you'd
trigger it just by using vt(4).  It should be fixed in later FreeBSD
versions, but is still a problem in 11.  Here's a (untested) patch:

Index: sys/dev/drm2/i915/i915_gem.c
===
--- sys/dev/drm2/i915/i915_gem.c(revision 365772)
+++ sys/dev/drm2/i915/i915_gem.c(working copy)
@@ -1863,6 +1863,8 @@ i915_gem_object_truncate(struct drm_i915_gem_objec
vm_obj = obj->base.vm_obj;
VM_OBJECT_WLOCK(vm_obj);
vm_object_page_remove(vm_obj, 0, 0, false);
+   if (vm_obj->type == OBJT_SWAP)
+   swap_pager_freespace(vm_obj, 0, vm_obj->size);
VM_OBJECT_WUNLOCK(vm_obj);
i915_gem_object_free_mmap_offset(obj);
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to free used Swap-Space?

2020-09-22 Thread Mark Johnston
On Tue, Sep 22, 2020 at 06:08:01PM +0200, Peter wrote:
> Hi all,
> 
>  my machine should use about 3-4, maybe 5 GB swapspace. Today I found
> it suddenly uses 8 GB (which is worryingly near the configured 10G).
> 
> I stopped all the big suckers - nothing found.
> I stopped all the jails - no success.
> I brought it down to singleuser: it tried to swapoff, but failed.
> 
> I unmounted all filesystems, exported all pools, detached all geli,
> and removed most of the netgraphs. Swap is still occupied.
> 
> Machine is now running only the init and a shell processes, has
> almost no filesystems mounted, has mostly native networks only, and
> this still occupies 3 GB of swap which cannot be released.
> 
> What is going on, what is doing this, and how can I get this swapspace
> released??

Do you have any shared memory segments lingering?  ipcs -a will show
SysV shared memory usage.  For POSIX shared memory, in 11.4 we do not
have any good way of listing objects, but "vmstat -m | grep shmfd" will
at least show whether any are allocated.  If those don't turn anything
up then it's possible that there's a swap leak.  Do you use any DRM
graphics drivers on this system?

> 
> It is 11.4-RELEASE-p3 amd64.
> 
> 
> Script started on Mon Sep 21 05:43:20 2020
> root@edge# ps axlww
> UID   PID  PPID CPU PRI NI  VSZ  RSS MWCHAN   STAT TT TIME COMMAND
>   0 0 0   0 -16  00  752 swapin   DLs   -291:32.41 [kernel]
>   0 1 0   0  20  0 5416  248 wait ILs   -  0:00.22 /sbin/init 
> --
>   0 2 0   0 -16  00   16 ftcl DL-  0:00.00 [ftcleanup]
>   0 3 0   0 -16  00   16 crypto_w DL-  0:00.00 [crypto]
>   0 4 0   0 -16  00   16 crypto_r DL-  0:00.00 [crypto 
> returns]
>   0 5 0   0 -16  00   32 -DL- 11:41.94 [cam]
>   0 6 0   0  -8  00   80 t->zthr_ DL- 13:07.13 [zfskern]
>   0 7 0   0 -16  00   16 waiting_ DL-  0:00.00 
> [sctp_iterator]
>   0 8 0   0 -16  00   16 -DL-  2:05.20 
> [rand_harvestq]
>   0 9 0   0 -16  00   16 -DL-  0:00.04 [soaiod1]
>   010 0   0 155  00   64 -RNL   -  17115:06.48 [idle]
>   011 0   0 -52  00  352 -WL- 49:05.30 [intr]
>   012 0   0 -16  00   64 sleepDL- 16:28.51 [ng_queue]
>   013 0   0  -8  00   48 -DL- 23:10.60 [geom]
>   014 0   0 -16  00   16 seqstate DL-  0:00.00 [sequencer 
> 00]
>   015 0   0 -68  00  160 -DL-  0:23.64 [usb]
>   016 0   0 -16  00   16 -DL-  0:00.04 [soaiod2]
>   017 0   0 -16  00   16 -DL-  0:00.04 [soaiod3]
>   018 0   0 -16  00   16 -DL-  0:00.04 [soaiod4]
>   019 0   0 -16  00   16 idle DL-  0:00.83 
> [enc_daemon0]
>   020 0   0 -16  00   48 psleep   DL- 12:07.72 
> [pagedaemon]
>   021 0   0  20  00   16 psleep   DL-  4:12.41 [vmdaemon]
>   022 0   0 155  00   16 pgzero   DNL   -  0:00.00 [pagezero]
>   023 0   0 -16  00   64 psleep   DL-  0:23.50 [bufdaemon]
>   024 0   0  20  00   16 -DL-  0:04.21 
> [bufspacedaemon]
>   025 0   0  16  00   16 syncer   DL-  0:32.48 [syncer]
>   026 0   0 -16  00   16 vlruwt   DL-  0:02.31 [vnlru]
>   027 0   0 -16  00   16 -DL-  7:11.58 [racctd]
>   0   157 0   0  20  00   16 geli:w   DL-  0:22.03 [g_eli[0] 
> ada1p2]
>   0   158 0   0  20  00   16 geli:w   DL-  0:22.77 [g_eli[1] 
> ada1p2]
>   0   159 0   0  20  00   16 geli:w   DL-  0:31.08 [g_eli[2] 
> ada1p2]
>   0   160 0   0  20  00   16 geli:w   DL-  0:29.41 [g_eli[3] 
> ada1p2]
>   0 70865 1   0  20  0 7076 3104 wait Ss   v0  0:00.21 -sh (sh)
>   0 71135 70865   0  20  0 6392 2308 select   S+   v0  0:00.00 script
>   0 71136 71135   0  23  0 7076 3068 wait Ss0  0:00.00 /bin/sh -i
>   0 71142 71136   0  23  0 6928 2584 -R+0  0:00.00 ps axlww
> 
> root@edge# df
> Filesystem  512-blocksUsed   Avail Capacity  Mounted on
> /dev/ada3p31936568  860864  92078448%/
> devfs2   2   0   100%/dev
> procfs   8   8   0   100%/proc
> /dev/ada3p43099192 1184896 166636842%/usr
> /dev/ada3p5 5803448112  525808 2%/var
> 
> root@edge# pstat -s
> Device  512-blocks UsedAvail Capacity
> /dev/ada1p2.eli   10485760  5839232  464652856%
> 
> root@edge# top | cat
> last pid: 71147;  load averages:  0.19,  0.08,  0.09  up 3+03:21:00
> 05:44:12
> 5 processes:1 running, 4 sleeping
> 
> Mem: 9732K Active, 10M Inact, 882M Laundry, 1920M 

Re: Traffic "corruption" in 12-stable

2020-08-04 Thread Mark Johnston
On Mon, Aug 03, 2020 at 05:22:37PM -0400, Joe Clarke wrote:
> > On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
> >> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
> >> There are some fixes for vmx not present in stable/12 (yet).  I did a
> >> merge of a number of outstanding revisions.  Would you be able to test
> >> the patch?  I haven't observed any problems with it on a host using igb,
> >> but I have no ability to test vmx at the moment.
> > 
> > I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
> > performance that appealed to me.  I tried a few of them on my last kernel.  
> > That took much longer to exhibit the problem, but eventually did.
> > 
> > I can tell you I don’t have all of these patches in, though.  I’ll build 
> > with this diff and start running it now.  I’ll let you know how it goes.
> 
> So it’s been just over a week of runtime with this full patch set.  I have 
> seen no further issues with ingress packet “truncation”, and performance has 
> been what I expect.  I’m going to keep running, but I think this seems like a 
> good set to MFC.

Done in r363844, thanks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Mark Johnston
On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
> About two weeks ago, I upgraded from the latest 11-stable to the latest 
> 12-stable.  After that, I periodically see the network throughput come to a 
> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
> acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
> ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
> VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
> the default 1500.
> 
> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
> times), I know the problem has occurred because my lldpd reports:
> 
> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
> bridge0
> 
> And if I turn on ipfw verbose messages, I see tons of:
> 
> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
> 
> This leads to me to believe packets are being corrupted on ingress.  I’ve 
> applied all the recent iflib changes, but the problem persists. What causes 
> it, I don’t know.
> 
> The only thing that changed (and yes, it’s a big one) is I upgraded to 
> 12-stable.  Meaning, the rest of the network infra and topology has remained 
> the same.  This did not happen at all in 11-stable.
> 
> I’m open to suggestions.

There are some fixes for vmx not present in stable/12 (yet).  I did a
merge of a number of outstanding revisions.  Would you be able to test
the patch?  I haven't observed any problems with it on a host using igb,
but I have no ability to test vmx at the moment.

https://people.freebsd.org/~markj/patches/iflib-stable12.diff
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [FreeBSD-Announce] FreeBSD Errata Notice FreeBSD-EN-20:14.linuxkpi

2020-07-09 Thread Mark Johnston
On Thu, Jul 09, 2020 at 11:13:01AM +0200, Andrea Venturoli wrote:
> On 2020-07-09 10:31, Niclas Zeising wrote:
> 
> > I am unsure, but it seems mostly related to using X forwarding, when 
> > looking at the errata notice.
> 
> Which I'm using heavily...
> 
> 
> 
> > What issues are you seeing, on which hardware?
> 
> I've extensively wrote about this on x11@, e.g.:
> https://lists.freebsd.org/pipermail/freebsd-x11/2020-May/025989.html

This is a different bug, unfortunately.  The one fixed by the patch is
described in PR 242913.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kldunload geom_journal hangs with jfini

2020-04-03 Thread Mark Johnston
On Thu, Mar 05, 2020 at 01:53:42AM +, Johannes Totz wrote:
> Hi everyone,
> 
> a recent 12-stable rev 358557 amd64 build hangs on kldunload of
> geom_journal.
> To reproduce:
>  kldload geom_journal
>  kldunload geom_journal
> 
> kldunload just hangs indefinitely. top says the state is "jfini:". It's
> using some cpu time though.
> procstat -k hangs as well.
> 
> Very reproducable, on both virtualbox as well as real hardware.
> 
> Any extra info that could help?

Hi Johannes,

This should be fixed by r359595.  It'll be merged to stable/12 in a few
days.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: wrong value from DTRACE (uint32 for int64)

2019-12-02 Thread Mark Johnston
On Mon, Dec 02, 2019 at 08:44:59PM +0100, Peter wrote:
> Hi @all,
> 
> I felt the need to look into my ZFS ARC, but DTRACE provided misleading  
> (i.e., wrong) output (on i386, 11.3-RELEASE):
> 
> # dtrace -Sn 'arc-available_memory { printf("%x %x", arg0, arg1); }'
> DIFO 0x286450a0 returns D type (integer) (size 8)
> OFF OPCODE  INSTRUCTION
> 00: 29010601ldgs DT_VAR(262), %r1   ! DT_VAR(262) = "arg0"
> 01: 2301ret  %r1
> 
> NAME ID   KND SCP FLAG TYPE
> arg0 262  scl glb rD type (integer) (size 8)
> 
> DIFO 0x286450f0 returns D type (integer) (size 8)
> OFF OPCODE  INSTRUCTION
> 00: 29010701ldgs DT_VAR(263), %r1   ! DT_VAR(263) = "arg1"
> 01: 2301ret  %r1
> 
> NAME ID   KND SCP FLAG TYPE
> arg1 263  scl glb rD type (integer) (size 8)
> dtrace: description 'arc-available_memory ' matched 1 probe
>0 14none:arc-available_memory 2fb000 2
>0 14none:arc-available_memory 4e000 2
>1 14none:arc-available_memory b000 2
>1 14none:arc-available_memory b000 2
>1 14none:arc-available_memory b000 2
>1 14none:arc-available_memory 19000 2
>0 14none:arc-available_memory d38000 2
> 
> # dtrace -n 'arc-available_memory { printf("%d %d", arg0, arg1); }'
>1 14none:arc-available_memory 81920 5
>1 14none:arc-available_memory 69632 5
>1 14none:arc-available_memory 4294955008 5
>1 14none:arc-available_memory 4294955008 5
> 
> 
> The arg0 Variable is shown here obviousely as an unsigned int32 value. But  
> in fact, the probe in the sourcecode in arc.c is a signed int64:
> 
>  DTRACE_PROBE2(arc__available_memory, int64_t, lowest, int, r);
> 
> 
> User @shkhin in the forum pointed me to check the bare dtrace program,  
> unattached to the kernel code:
> https://forums.freebsd.org/threads/dtrace-treats-int64_t-as-uint32_t-on-i386.73223/post-446517
> 
> And there everything appears correct.
> 
> So two questions:
> 1. can anybody check and confirm this happening?
> 2. any idea what could be wrong here? (The respective variable in arc.c  
> bears the correct 64bit negative value, I checked that - and otherwise the  
> ARC couldn't shrink.)

The DTRACE_PROBE* macros cast their parameters to uintptr_t, which
will be 32 bits wide on i386.  You might be able to work around the
problem by casting arg0 to uint32_t in the script.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Memory management changes after kernel update on 6-Aug

2019-08-09 Thread Mark Johnston
On Fri, Aug 09, 2019 at 01:05:50PM -0700, Kevin Oberman wrote:
> On Fri, Aug 9, 2019 at 11:35 AM Mark Johnston  wrote:
> 
> > On Fri, Aug 09, 2019 at 11:09:24AM -0700, Kevin Oberman wrote:
> > > Since I updated my 12.0-STABLE system on 6-Aug I have been seeing issues
> > > resuming my Win7 VM on VirtualBox. My prior kernel was built on 24-Jul.
> > If
> > > there is not sufficient memory available to reload the system (4 Meg.),
> > the
> >
> > Where does this number come from?  What memory usage stats do you see in
> > top(1) when the error occurs?
> >
> 
> I am monitoring memory usage with gkrellm. It appears to define "Free" as
> the sum of "Inactive" and "Free". If you are referring to size of the VM,
> was supposed to be the memory specified when I created the VM, but my
> fingers got ahead of my brain and it should have been 4G, not 4M. Hey!
> What's a few orders of magnitude?
> 
> Oddly, when I watch memory space closely I note that, as the VM loads, I
> started seeing swap utilization increase as free space was exhausted at
> about 80% loaded. Loading continued to 98%. at that point loading stopped
> and swap use continued to grow for a bit. Then free space started to
> increase from about 300M to about 700M before the error window popped up.
> 
> 
> > > resume fails with a message that memory was exhausted. Usually I can try
> > > resuming again and it will work. Sometimes I get the error two or three
> > > times before the system resumes.
> >
> > What exactly is the error message?
> >
> Failed to open a session for the virtual machine Win7.
> 
> Failed to load unit 'pgm' (VERR_EM_NO_MEMORY).
> 
> Result Code: NS_ERROR_FAILURE (0x80004005)
> Component: ConsoleWrap
> Interface: IConsole {872da645-4a9b-1727-bee2-5585105b9eed}
> 
> 
> >
> > > Since I have not touched VirtualBox other than to rebuild the kmod after
> > > the kernel build, it looks like something in the OS triggered this. Since
> > > the system frees up some memory each time so that the VM eventually
> > > resumes, it looks like the memory request is made to the OS, but VB is
> > not
> > > waiting or not enough memory is freed to allow the VB to complete the
> > > resume.
> > >
> > > Any clue what might have changed over those 13 days? I am running GENERIC
> > > except that I run the 4BSD scheduler.
> >
> > Possible culprits are r350374 and r350375, but I can't really see how.
> >
> 
> This started after the 6-Aug build (r350664). My prior build was r350292,
> so just before these two commits.
> 
> Can I try just reverting these two? Once I do, it will need to run for a
> while or do something to tie up a lot of memory before the error will
> recur. In normal use it is a matter of firefox increasing resident memory
> until there is not enough free memory to load the VM without swapping.
> (These days I often see the sum of all firefox process resident memory
> exceeding 3G after it's been up for a day or two. Still, not worse than
> chromium.)

Those commits can simply be reverted, but I am skeptical that they will
help.  You should also verify that these same conditions don't lead to
errors on your prior build, if you haven't already.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Memory management changes after kernel update on 6-Aug

2019-08-09 Thread Mark Johnston
On Fri, Aug 09, 2019 at 11:09:24AM -0700, Kevin Oberman wrote:
> Since I updated my 12.0-STABLE system on 6-Aug I have been seeing issues
> resuming my Win7 VM on VirtualBox. My prior kernel was built on 24-Jul. If
> there is not sufficient memory available to reload the system (4 Meg.), the

Where does this number come from?  What memory usage stats do you see in
top(1) when the error occurs?

> resume fails with a message that memory was exhausted. Usually I can try
> resuming again and it will work. Sometimes I get the error two or three
> times before the system resumes.

What exactly is the error message?

> Since I have not touched VirtualBox other than to rebuild the kmod after
> the kernel build, it looks like something in the OS triggered this. Since
> the system frees up some memory each time so that the VM eventually
> resumes, it looks like the memory request is made to the OS, but VB is not
> waiting or not enough memory is freed to allow the VB to complete the
> resume.
> 
> Any clue what might have changed over those 13 days? I am running GENERIC
> except that I run the 4BSD scheduler.

Possible culprits are r350374 and r350375, but I can't really see how.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ICMP timestamps

2019-03-26 Thread Mark Johnston
On Tue, Mar 26, 2019 at 05:31:58PM -0400, Hans Fiedler wrote:
> I did a freebsd-update from 11.2 to 12.0.  I had set the sysctl variable
> net.inet.icmp.tstamprepl to 0 in 11.2 to block the ICMP timestamp
> responses, but doesn't seem to work on 12.0.  I wanted to ask in I was
> missing something else after searching for any other values that should
> effect it.

This was a problem in the tstamprepl definition that was exposed when
VNET was enabled by default in the kernel.  It's now fixed in head by
r345560 and will be merged back to the stable branches in the next
several days.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Mark Johnston
On Wed, Feb 13, 2019 at 01:48:21AM +0700, Eugene Grosbein wrote:
> 13.02.2019 1:42, Mark Johnston wrote:
> 
> >> Yes, I have debugger compiled into running kernel and have console access.
> >> What commands should I use?
> > 
> > I meant kgdb(1).  If you can run that, try:
> > 
> > (kgdb) p time_uptime
> > (kgdb) p lowmem_uptime
> > 
> > If you are willing to drop the system into DDB, do so and run
> > 
> > db> x time_uptime
> > db> x lowmem_uptime
> 
> I will reach the console next day only. Is it wise to use kgdb over ssh for 
> running remote system? :-)

It should be fine.  kgdb opens /dev/(k)mem read-only.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Mark Johnston
On Wed, Feb 13, 2019 at 01:40:06AM +0700, Eugene Grosbein wrote:
> 13.02.2019 1:18, Mark Johnston wrote:
> 
> > On Wed, Feb 13, 2019 at 01:03:37AM +0700, Eugene Grosbein wrote:
> >> 12.02.2019 23:34, Mark Johnston wrote:
> >>
> >>> I suspect that the "leaked" memory is simply being used to cache UMA
> >>> items.  Note that the values in the FREE column of vmstat -z output are
> >>> quite large.  The cached items are reclaimed only when the page daemon
> >>> wakes up to reclaim memory; if there are no memory shortages, large
> >>> amounts of memory may accumulate in UMA caches.  In this case, the sum
> >>> of the product of columns 2 and 5 gives a total of roughly 4GB cached.
> >>>
> >>>> as well as "sysctl hw": 
> >>>> http://www.grosbein.net/freebsd/leak/sysctl-hw.txt
> >>>> and "sysctl vm": http://www.grosbein.net/freebsd/leak/sysctl-vm.txt
> >>
> >> It seems page daemon is broken somehow as it did not reclaim several gigs 
> >> of wired memory
> >> despite of long period of vm thrashing:
> > 
> > Depending on the system's workload, it is possible for the caches to
> > grow quite quickly after a reclaim.  If you are able to run kgdb on the
> > kernel, you can find the time of the last reclaim by comparing the
> > values of lowmem_uptime and time_uptime.
> 
> Yes, I have debugger compiled into running kernel and have console access.
> What commands should I use?

I meant kgdb(1).  If you can run that, try:

(kgdb) p time_uptime
(kgdb) p lowmem_uptime

If you are willing to drop the system into DDB, do so and run

db> x time_uptime
db> x lowmem_uptime
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Mark Johnston
On Wed, Feb 13, 2019 at 01:03:37AM +0700, Eugene Grosbein wrote:
> 12.02.2019 23:34, Mark Johnston wrote:
> 
> > I suspect that the "leaked" memory is simply being used to cache UMA
> > items.  Note that the values in the FREE column of vmstat -z output are
> > quite large.  The cached items are reclaimed only when the page daemon
> > wakes up to reclaim memory; if there are no memory shortages, large
> > amounts of memory may accumulate in UMA caches.  In this case, the sum
> > of the product of columns 2 and 5 gives a total of roughly 4GB cached.
> > 
> >> as well as "sysctl hw": http://www.grosbein.net/freebsd/leak/sysctl-hw.txt
> >> and "sysctl vm": http://www.grosbein.net/freebsd/leak/sysctl-vm.txt
> 
> It seems page daemon is broken somehow as it did not reclaim several gigs of 
> wired memory
> despite of long period of vm thrashing:

Depending on the system's workload, it is possible for the caches to
grow quite quickly after a reclaim.  If you are able to run kgdb on the
kernel, you can find the time of the last reclaim by comparing the
values of lowmem_uptime and time_uptime.

> $ sed 's/:/,/' vmstat-z.txt | awk -F, '{printf "%10s %s\n", $2*$5/1024/1024, 
> $1}' | sort -k1,1 -rn | head
>   1892 abd_chunk
>454.629 dnode_t
> 351.35 zio_buf_512
>228.391 zio_buf_16384
>173.968 dmu_buf_impl_t
> 130.25 zio_data_buf_131072
>93.6887 VNODE
>81.6978 arc_buf_hdr_t_full
>74.9368 256
>57.4102 4096
> 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Mark Johnston
On Tue, Feb 12, 2019 at 11:14:31PM +0700, Eugene Grosbein wrote:
> Hi!
> 
> Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel wired 
> memory over 81 days uptime
> out of 8GB total RAM.
> 
> Details follow.
> 
> I have a workstation running Xorg, Firefox, Thunderbird, LibreOffice and 
> occasionally VirtualBox for single VM.
> 
> It has two identical 320GB HDDs combined with single graid-based array with 
> "Intel"
> on-disk format having 3 volumes:
> - one "RAID1" volume /dev/raid/r0 occupies first 10GB or each HDD;
> - two "SINGLE" volumes /dev/raid/r1 and /dev/raid/r2 that utilize "tails" of 
> HDDs (310GB each).
> 
> /dev/raid/r0 (10GB) has MBR partitioning and two slices:
> - /dev/raid/r0s1 (8GB) is used for swap;
> - /dev/raid/r0s2 (2GB) is used by non-redundant ZFS pool named "os" that 
> contains only
> root file system (177M used) and /usr file system (340M used).
> 
> There is also second pool (ZMIRROR) named "z" built directly on top of 
> /dev/raid/r[12] volumes,
> this pool contains all other file systems including /var, /home, /usr/ports, 
> /usr/local, /usr/{src|obj} etc.
> 
> # zpool list
> NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  
> ALTROOT
> os1,98G   520M  1,48G- -55%25%  1.00x  ONLINE  -
> z  288G  79,5G   209G- -34%27%  1.00x  ONLINE  -
> 
> This way I have swap outside of ZFS, boot blocks and partitioning mirrored by 
> means of GEOM_RAID and
> can use local console to break to single user mode to unmount all file system 
> other than root and /usr
> and can even export bigger ZFS pool "z". And I did that to see that ARC usage
> (limited with vfs.zfs.arc_max="3G" in /boot/loader.conf) dropped from over 
> 2500MB 
> down to 44MB but "Wired" stays high. Now after I imported "z" back and booted 
> to multiuser mode
> top(1) shows:
> 
> last pid: 51242;  load averages:  0.24,  0.16,  0.13  up 81+02:38:38  22:59:18
> 104 processes: 1 running, 103 sleeping
> CPU:  0.0% user,  0.0% nice,  0.4% system,  0.2% interrupt, 99.4% idle
> Mem: 84M Active, 550M Inact, 4K Laundry, 4689M Wired, 2595M Free
> ARC: 273M Total, 86M MFU, 172M MRU, 64K Anon, 1817K Header, 12M Other
>  117M Compressed, 333M Uncompressed, 2.83:1 Ratio
> Swap: 8192M Total, 940K Used, 8191M Free
> 
> I have KDB and DDB in my custom kernel also. How do I debug the leak further?
> 
> I use nvidia-driver-340-340.107 driver for GK208 [GeForce GT 710B] video card.
> Here are outputs of "vmstat -m": 
> http://www.grosbein.net/freebsd/leak/vmstat-m.txt
> and "vmstat -z": http://www.grosbein.net/freebsd/leak/vmstat-z.txt

I suspect that the "leaked" memory is simply being used to cache UMA
items.  Note that the values in the FREE column of vmstat -z output are
quite large.  The cached items are reclaimed only when the page daemon
wakes up to reclaim memory; if there are no memory shortages, large
amounts of memory may accumulate in UMA caches.  In this case, the sum
of the product of columns 2 and 5 gives a total of roughly 4GB cached.

> as well as "sysctl hw": http://www.grosbein.net/freebsd/leak/sysctl-hw.txt
> and "sysctl vm": http://www.grosbein.net/freebsd/leak/sysctl-vm.txt
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-28 Thread Mark Johnston
On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote:
> <  said:
> 
> > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> >> Has anyone seen this before?  It's on a busy NFS server, but hasn't
> >> been observed on any of our other NFS servers.
> >> 
> >> 
> >> Fatal trap 12: page fault while in kernel mode
> 
> >> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
> >> 0xfe17eb8d0750 ---
> >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
> 
> > What is the line number for vm_page_alloc_after+0x15d ?
> > Do you have NUMA enabled on 11 ?
> 
> If gdb is to be believed, the trap is at line 1687:
> 
> /*
>  *  At this point we had better have found a good page.
>  */
> KASSERT(m != NULL, ("missing page"));
> free_count = vm_phys_freecnt_adj(m, -1);
> >>  if ((m->flags & PG_ZERO) != 0)
> vm_page_zero_count--;
> mtx_unlock(_page_queue_free_mtx);
> vm_page_alloc_check(m);
> 
> The faulting instruction is:
> 
> 0x809a903d :   testb  $0x8,0x5a(%r14)
> 
> There are no options matching /numa/i in the configuration.  (This is
> a non-debugging configuration so the KASSERT is inoperative, I
> assume.)  I have about a dozen other servers with the same kernel and
> they're not crashing, but obviously they all have different loads and
> sets of active clients.

If you're using a Skylake, I suspect that you can set the
hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
any code that relies on Intel TSX.  (I don't think there's anything in
the base system that does.)  There are some details in
https://reviews.freebsd.org/D18374
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Where is my memory on 'fresh' 11-STABLE? It should be used by ARC, but it is not used for it anymore.

2018-11-20 Thread Mark Johnston
On Tue, Nov 20, 2018 at 03:42:24PM +0300, Lev Serebryakov wrote:
> 
>  I have server which is mostly torrent box. It uses ZFS and equipped
> with 16GiB of physical memory. It is running 11-STABLE (r339914 now).
> 
>  I've updated it to r339914 from some 11.1-STABLE revision 3 weeks ago.
> 
>  I was used to see 13-14GiB of memory in ZFS ARC and it was Ok.
> Sometimes it "locks" under heavy disk load due to ARC memory pressure,
> but it was bearable, and as ZFS is main reason this server exists, I
> didn't limit ARC.
> 
>  But new revision (r339914) shows very strange behaivor: ARC is no more
> than 4GiB, but kernel has 15GiB wired:
> 
> Mem: 22M Active, 656M Inact, 62M Laundry, 15G Wired, 237M Free
> ARC: 4252M Total, 2680M MFU, 907M MRU, 3680K Anon, 15M Header, 634M Other
>  2789M Compressed, 3126M Uncompressed, 1.12:1 Ratio
> 
>  It is typical numbers for last week: 15G wired, 237M Free, but only
> 4252M ARC!
> 
>  Where is other 11G of memory?!
> 
> I've checked USED and FREE in "vmstat -z" output and got this:
> 
> $ vmstat -z | tr : , | awk -F , '1{print $2*$4,$2*$5,$1}' | sort -n |
> tail -20
> 23001088 9171456 MAP ENTRY
> 29680800 8404320 VM OBJECT
> 34417408 10813952 256
> 36377964 2665656 S VFS Cache
> 50377392 53856 sa_cache
> 50593792 622985216 zio_buf_131072
> 68913152 976896 mbuf_cluster
> 73543680 7225344 mbuf_jumbo_page
> 92358552 67848 zfs_znode_cache
> 95731712 51761152 4096
> 126962880 159581760 dmu_buf_impl_t
> 150958080 233920512 mbuf_jumbo_9k
> 165164600 92040 VNODE
> 192701120 30350880 UMA Slabs
> 205520896 291504128 zio_data_buf_1048576
> 222822400 529530880 zio_data_buf_524288
> 259143168 293476864 zio_buf_512
> 352485376 377061376 zio_buf_16384
> 376109552 346474128 dnode_t
> 2943016960 5761941504 abd_chunk
> $
> 
>  And total USED/FREE numbers is very strange for me:
> 
> $ vmstat -z | tr : , | awk -F , '1{u+=$2*$4; f+=$2*$5} END{print u,f}'
> 5717965420 9328951088
> $
> 
>  So, only ~5.7G is used and 9.3G is free! But why this memory is not
> used by ARC anymore and why is it wired and not free?

Could you show the output of "vmstat -s" when in this state?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 11.2 kernel crash when dd

2018-10-19 Thread Mark Johnston
On Fri, Oct 19, 2018 at 01:10:15PM +0200, Sebastian Wojtczak wrote:
> Hi,
> 
> I would like to report a kernel crash while dd on ssd drive.
> 
> Just found that my PC crashed several times during below command:
> dd if=/dev/ada2 of=file_name bs=10m.
> 
> I was trying to make an image from my ssd drive. Once dump file hit size
> 41G or 52G kernel crashes and reboot the system.
> 
> Oct 18 12:30:11 username syslogd: kernel boot file is /boot/kernel/kernel
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel:
> Oct 18 12:30:11 username kernel: Fatal trap 12: page fault while in kernel
> mode
> Oct 18 12:30:11 username kernel: cpuid = 1; apic id = 01
> Oct 18 12:30:11 username kernel: fault virtual address  = 0x5a
> Oct 18 12:30:11 username kernel: fault code = supervisor read
> data, page not present
> Oct 18 12:30:11 username kernel: instruction pointer=
> 0x20:0x80e67f6d
> Oct 18 12:30:11 username kernel: stack pointer  =
> 0x28:0xfe084b408f40
> Oct 18 12:30:11 username kernel: frame pointer  =
> 0x28:0xfe084b408f80
> Oct 18 12:30:11 username kernel: code segment   = base 0x0, limit
> 0xf, type 0x1b
> Oct 18 12:30:11 username kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> Oct 18 12:30:11 username kernel: processor eflags   = interrupt
> enabled, resume, IOPL = 0
> Oct 18 12:30:11 username kernel: current process= 0
> (zio_write_issue_8)
> Oct 18 12:30:11 username kernel: trap number= 12
> Oct 18 12:30:11 username kernel: panic: page fault
> Oct 18 12:30:11 username kernel: cpuid = 1
> Oct 18 12:30:11 username kernel: KDB: stack backtrace:
> Oct 18 12:30:11 username kernel: #0 0x80b50087 at kdb_backtrace+0x67
> Oct 18 12:30:11 username kernel: #1 0x80b099f7 at vpanic+0x177
> Oct 18 12:30:11 username kernel: #2 0x80b09873 at panic+0x43
> Oct 18 12:30:11 username kernel: #3 0x80fe105f at trap_fatal+0x35f
> Oct 18 12:30:11 username kernel: #4 0x80fe10b9 at trap_pfault+0x49
> Oct 18 12:30:11 username kernel: #5 0x80fe0887 at trap+0x2c7
> Oct 18 12:30:11 username kernel: #6 0x80fc04cc at calltrap+0x8
> Oct 18 12:30:11 username kernel: #7 0x80e56df2 at kmem_back+0xf2
> Oct 18 12:30:11 username kernel: #8 0x80e56cd0 at kmem_malloc+0x60
> Oct 18 12:30:11 username kernel: #9 0x80e4e752 at
> keg_alloc_slab+0xe2
> Oct 18 12:30:11 username kernel: #10 0x80e5118e at
> keg_fetch_slab+0x14e
> Oct 18 12:30:11 username kernel: #11 0x80e509a4 at
> zone_fetch_slab+0x64
> Oct 18 12:30:11 username kernel: #12 0x80e50a7f at zone_import+0x3f
> Oct 18 12:30:11 username kernel: #13 0x80e4d199 at
> uma_zalloc_arg+0x3d9
> Oct 18 12:30:11 username kernel: #14 0x832d2ab2 at
> zio_write_compress+0x1e2
> Oct 18 12:30:11 username kernel: #15 0x832d174c at zio_execute+0xac
> Oct 18 12:30:11 username kernel: #16 0x80b617e4 at
> taskqueue_run_locked+0x154
> Oct 18 12:30:11 username kernel: #17 0x80b62918 at
> taskqueue_thread_loop+0x98
> Oct 18 12:30:11 username kernel: Uptime: 5m50s
> 
> One virtual machine is started with bhyve at startup but even if I shutdown
> it, same crash happen. Disabling vmm does not help but only extend time to
> crash during ssd dump.
> 
> Current zfs setup is zraid on 3 (500GB) hdd drives with compress=on. Drive
> ada0 is not part of zraid and is not attached/mount what ever.
> 
> Any help how to investigate it is appreciated.

The stack suggests a bug in the kmem_* KPI, but I'm having trouble
seeing the problem.  In particular, the fault address suggests that we
crashed while testing (m->flags & PG_ZERO) == 0, but it shouldn't be
possible for m to be NULL there.  My attempts to reproduce this on
12-CURRENT haven't yielded anything yet.  Would you (or anyone else
seeing the problem) be willing to share a kernel dump?  I'd need the
vmcore, the contents of /boot/kernel and /usr/lib/debug/boot/kernel.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Cannot setup dumpdev on glabel disk

2018-08-31 Thread Mark Johnston
On Sat, Sep 01, 2018 at 02:09:12AM +0700, Eugene Grosbein wrote:
> 31.08.2018 23:08, Samuel Chow wrote:
> 
> > I am running 11-STABLE, and I am experiencing kernel panics when I am 
> > destroying a VIMAGE-based jail. Naturally, I flipped to the chapter about 
> > 'Kernel Debugging' to learn about 'Obtaining a Kernel Crash Dump'.
> > 
> > However, I am finding that my permanently glabel'ed disk partition cannot 
> > be used as dumpdev. Is that true, and why not? I mean, swap can use it just 
> > fine. I am unable to find this restriction in the documentation.
> > 
> > 
> > # grep swap /etc/fstab
> > /dev/label/boot01b  none swap   sw  0 0
> > # swapinfo
> > Device  1K-blocks UsedAvail Capacity
> > /dev/label/boot01b  419430400 41943040 0%
> > # glabel status | grep boot
> >   label/boot01 N/A  ada4s1
> >   label/boot02 N/A  ada5s1
> > # dumpon /dev/label/boot01b
> > dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported by device
> 
> That's not about label but underlying device that seems to be GEOM_PART_MBR
> and it allows kernel dumps only if slice (MBR partition) type is 0xa5 for 
> "freebsd"
> or 0x82 ("linux swap"). Please show output of the command "gpart show ada4".

Ah, right, please ignore my other reply.  When I actually test it
myself, dumpon /dev/label/foo seems to work; I assumed the lack of
handling for GEOM::kerneldump in the glabel code was a problem.  Sorry
for the noise.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Cannot setup dumpdev on glabel disk

2018-08-31 Thread Mark Johnston
On Fri, Aug 31, 2018 at 10:08:20AM -0600, Samuel Chow wrote:
> I am running 11-STABLE, and I am experiencing kernel panics when I am 
> destroying a VIMAGE-based jail. Naturally, I flipped to the chapter 
> about 'Kernel Debugging' to learn about 'Obtaining a Kernel Crash Dump'.
> 
> However, I am finding that my permanently glabel'ed disk partition 
> cannot be used as dumpdev. Is that true, and why not? I mean, swap can 
> use it just fine. I am unable to find this restriction in the documentation.

I don't think that there is a real reason for this restriction; the
glabel GEOM class simply doesn't implement the required support.

Do you mind submitting a PR for this at
https://bugs.freebsd.org/bugzilla/ ? I will try and get that fixed soon
if no one else beats me to it.

> # grep swap /etc/fstab
> /dev/label/boot01b  none swap   sw  0 0
> # swapinfo
> Device  1K-blocks Used    Avail Capacity
> /dev/label/boot01b  41943040    0 41943040 0%
> # glabel status | grep boot
>    label/boot01 N/A  ada4s1
>    label/boot02 N/A  ada5s1
> # dumpon /dev/label/boot01b
> dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported by device
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-08-04 Thread Mark Johnston
On Sat, Aug 04, 2018 at 08:38:04PM +0200, Mark Martinec wrote:
> 2018-08-04 19:01, Mark Johnston wrote:
> > I think running "zpool list" is adding a lot of noise to the output.
> > Could you retry without doing that?
> 
> No, like I said previously, the "zpool list" (with one defunct
> zfs pool) *is* the sole culprit of the zfs memory leak.
> With each invocation of "zpool list" the "solaris" malloc
> jumps up by the same amount, and never ever drops. Without
> running it (like repeatedly under 'telegraf' monitoring
> of zfs), the machine runs normally and never runs out of
> memory, the "solaris" malloc count no longer grows steadily.

Sorry, I missed that message.  Given that information, it would be
useful to see the output of the following script instead:

# dtrace -c "zpool list -Hp" -x temporal=off -n '
 dtmalloc::solaris:malloc
   /pid == $target/{@allocs[stack(), args[3]] = count()}
 dtmalloc::solaris:free
   /pid == $target/{@frees[stack(), args[3]] = count();}'

This will record all allocations and frees from a single instance of
"zpool list".
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-08-04 Thread Mark Johnston
On Fri, Aug 03, 2018 at 09:11:42PM +0200, Mark Martinec wrote:
> More attempts at tracking this down. The suggested dtrace command does
> usually abort with:
> 
>Assertion failed: (buf->dtbd_timestamp >= first_timestamp),
>  file 
> /usr/src/cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c,
>  line 3330.

Hrmm.  As a workaround you can add "-x temporal=off" to the dtrace(1)
invocation.

> but with some luck soon after each machine reboot I can leave the dtrace
> running for about 10 or 20 seconds (max) before terminating it with a 
> ^C,
> and succeed in collecting the report.  If I miss the opportunity to 
> leave
> dtrace running just long enough to collect useful info, but not long
> enough for it to hit the assertion check, then any further attempt
> to run the dtrace script hits the assertion fault immediately.
> 
> Btw, (just in case) I have recompiled kernel from source 
> (base/release/11.2.0)
> with debugging symbols, although the behaviour has not changed:
> 
>FreeBSD floki.ijs.si 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r337238:
>  Fri Aug 3 17:29:42 CEST 2018 
> m...@xxx.ijs.si:/usr/obj/usr/src/sys/FLOKI amd64
> 
> 
> Anyway, after several attempts I was able to collect a useful dtrace
> output from the suggested dtrace stript:
> 
> # dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] =
>count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = count()}'
> 
> while running "zpool list" repeatedly in another terminal screen:

I think running "zpool list" is adding a lot of noise to the output.
Could you retry without doing that? 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-07-31 Thread Mark Johnston
On Tue, Jul 31, 2018 at 11:54:29PM +0200, Mark Martinec wrote:
> I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
> and the situation has not improved. Also turned off all services.
> ZFS is still leaking memory about 30 MB per hour, until the host
> runs out of memory and swap space and crashes, unless I reboot it
> first every four days.
> 
> Any advise before I try to get rid of that faulted disk with a pool
> (or downgrade to 10.3, which was stable) ?

If you're able to use dtrace, it would be useful to try tracking
allocations with the solaris tag:

# dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] =
  count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = count();}'

Try letting that run for one minute, then kill it and paste the output.
Ideally the host will be as close to idle as possible while still
demonstrating the leak.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Mark Johnston
On Tue, Jul 25, 2017 at 12:03:05AM +0700, Eugene Grosbein wrote:
> Thanks, this helped:
> 
> $ addr2line -f -e kernel.debug 0x80919c00
> g_raid_shutdown_post_sync
> /home/src/sys/geom/raid/g_raid.c:2458
> 
> That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just 
> before
> crashdump generation but works just fine during normal system shutdown.

I think graid probably needs a treatment similar to r301173/r316032.
g_raid_shutdown_post_sync() appears to be quite similar to the
corresponding gmirror handler. In particular, it just attempts to mark
the individual components as clean and destroy the GEOM, which is not
really safe after a panic.

diff --git a/sys/geom/raid/g_raid.c b/sys/geom/raid/g_raid.c
index 7a1fd8c5ce2e..aa2529d5466a 100644
--- a/sys/geom/raid/g_raid.c
+++ b/sys/geom/raid/g_raid.c
@@ -2461,6 +2461,9 @@ g_raid_shutdown_post_sync(void *arg, int howto)
struct g_raid_softc *sc;
struct g_raid_volume *vol;
 
+   if (panicstr != NULL)
+   return;
+
mp = arg;
g_topology_lock();
g_raid_shutdown = 1;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-23 Thread Mark Johnston
On Thu, Jul 20, 2017 at 03:45:39PM +0200, Mark Martinec wrote:
> 2017-07-20 02:03, Mark Johnston wrote:
> > One thing to try at this point would be to disable EARLY_AP_STARTUP in
> > the kernel config. That is, take a configuration with which you're able
> > to reproduce the hang during boot, and remove "options
> > EARLY_AP_STARTUP".
> 
> Done. And it avoids the problem altogether! Thanks.
> Tried a reboot several times and it succeeds every time.

Thanks. Sorry for the delayed follow-up.

> 
> Here is all that I had in a config file for building a kernel,
> i.e. I took away the 'options DDB' which also seemingly avoided
> the problem:
>include GENERIC
>ident NELI
>nooptions EARLY_AP_STARTUP

Could you try re-enabling EARLY_AP_STARTUP, applying the patch at the
end of this email, and see if the message "sleeping before eventtimer
init" appears in the boot output? If it does, it'll be followed by a
backtrace that might be useful for tracking down the hang. It might
produce false positives, but we'll see.

> 
> > This feature has a fairly large impact on the bootup process and has
> > had a few problems that manifested as hangs during boot. There was at
> > least one other case where an innocuous change to the kernel
> > configuration "fixed" the problem by introducing some second-order
> > effect (causing kernel threads to be scheduled in a different
> > order, for instance).
> 
> > Regardless of whether the suggestion above makes a difference, it would
> > be helpful to see verbose dmesgs from both a clean boot and a boot that
> > hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some
> > assertions that will cause the system to panic when the hang occurs,
> > making it easier to see what's going on.
> 
> Hmmm.
> I have now saved a couple of versions of /var/run/dmesg.boot
> (in boot_verbose mode) when EARLY_AP_STARTUP is disabled and
> the boot is successful. However, I don't know how to capture
> such log when booting hangs, as I have no serial interface
> and the boot never completes. All I have is a screen photo
> of the last state when a hang occurs (showing ada disks
> successfully attached, followed immediately by the attempt
> to attach a da disk, which hangs).

Ok, let's not worry about this for now.

Index: sys/kern/kern_clock.c
===
--- sys/kern/kern_clock.c   (revision 321401)
+++ sys/kern/kern_clock.c   (working copy)
@@ -385,6 +385,8 @@
 static int devpoll_run = 0;
 #endif
 
+bool inited_clocks = false;
+
 /*
  * Initialize clock frequencies and start both clocks running.
  */
@@ -412,6 +414,8 @@
 #ifdef SW_WATCHDOG
EVENTHANDLER_REGISTER(watchdog_list, watchdog_config, NULL, 0);
 #endif
+
+   inited_clocks = true;
 }
 
 /*
Index: sys/kern/kern_synch.c
===
--- sys/kern/kern_synch.c   (revision 321401)
+++ sys/kern/kern_synch.c   (working copy)
@@ -298,6 +298,8 @@
return (rval);
 }
 
+extern bool inited_clocks;
+
 /*
  * pause() delays the calling thread by the given number of system ticks.
  * During cold bootup, pause() uses the DELAY() function instead of
@@ -330,6 +332,10 @@
DELAY(sbt);
return (0);
}
+   if (cold && !inited_clocks) {
+   printf("%s: sleeping before eventtimer init\n", 
curthread->td_name);
+   kdb_backtrace();
+   }
return (_sleep(_wchan[curcpu], NULL, 0, wmesg, sbt, pr, flags));
 }
 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-23 Thread Mark Johnston
On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote:
> On 14.01.2017 18:40, Eugene Grosbein wrote:
> > 
> >> I suspect that this is because we only stop the scheduler upon a panic
> >> if SMP is configured. Can you retest with the patch below applied?
> >>
> >> Index: sys/kern/kern_shutdown.c
> >> ===
> >> --- sys/kern/kern_shutdown.c   (revision 312082)
> >> +++ sys/kern/kern_shutdown.c   (working copy)
> >> @@ -713,6 +713,7 @@
> >>CPU_CLR(PCPU_GET(cpuid), _cpus);
> >>stop_cpus_hard(other_cpus);
> >>}
> >> +#endif
> >>  
> >>/*
> >> * Ensure that the scheduler is stopped while panicking, even if panic
> >> @@ -719,7 +720,6 @@
> >> * has been entered from kdb.
> >> */
> >>td->td_stopsched = 1;
> >> -#endif
> >>  
> >>bootopt = RB_AUTOBOOT;
> >>newpanic = 0;
> >>
> >>
> > 
> > Indeed, my router is uniprocessor system and your patch really solves the 
> > problem.
> > Now kernel generates crashdump just fine in case of panic. Please commit 
> > the fix, thanks!
> 
> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

Is this amd64 GENERIC, or something else?

> 
> - "call doadump" from DDB prompt works just fine;
> - "shutdown -r now" reboots the system without problems;
> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just 
> afer showing uptime
> instead of continuing with crashdump generation; same if "real" panic occurs.
> 
> Same for debug.minidump set to 1 or 0. How do I debug this?

I'm not able to reproduce the problem in bhyve using r321401. Looking
at the code, the culprits might be cngrab(), or one of the
shutdown_post_sync eventhandlers. Since you're apparently able to see
the console output at the time of the panic, I guess it's probably the
latter. Could you try your test with the patch below applied? It'll
print a bunch of "entering post_sync"/"leaving post_sync" messages with
addresses that can be resolved using kgdb. That'll help determine where
we're getting stuck.

Index: sys/sys/eventhandler.h
===
--- sys/sys/eventhandler.h  (revision 321401)
+++ sys/sys/eventhandler.h  (working copy)
@@ -85,7 +85,11 @@
_t = (struct eventhandler_entry_ ## name *)_ep; \
CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
(void *)_t->eh_func);   \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("entering post_sync %p\n", (void 
*)_t->eh_func); \
_t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("leaving post_sync %p\n", (void 
*)_t->eh_func); \
EHL_LOCK((list));   \
}   \
}   \
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-19 Thread Mark Johnston
On Thu, Jul 20, 2017 at 01:46:33AM +0200, Mark Martinec wrote:
> More news on the matter. As reported yesterday the locally built
> kernel with options INVARIANTS and DDB works fine and somehow avoids
> the trouble at attaching the da (mps) disks on an LSI controller, so
> today I wanted to get back to a reproducible hang - and sure enough,
> reverting to the generic kernel as distributed brings back the hang.
> 
> So I tried rebuilding the kernel while experimenting with options
> like DDB and INVARIANTS.
> 
> A locally built GENERIC kernel behaves the same as the original
> kernel from the distribution (as installed by freebsd-upgrade),
> so no surprises there. It hangs trying to attach the first of the
> da disks (after first successfully attaching all the ada disks).
> The alt ctrl esc is unable to enter debugger when the hang occurs
> (possibly due to an unresponsive USB keyboard at that time),
> even though the debug.kdb.break_to_debugger was set to 1 at a
> loader prompt. It needs loader "Safe mode" to be able to boot.
> 
> Next, a locally built kernel with DDB and INVARIANTS works well
> (the remaining options come from an included GENERIC).
> 
> Now the funny part: a locally built kernel with just the DDB
> option (and the rest included from GENERIC) *also* works well.
> Somehow the DDB option makes a difference, even though kernel
> debugger is never activated.

One thing to try at this point would be to disable EARLY_AP_STARTUP in
the kernel config. That is, take a configuration with which you're able
to reproduce the hang during boot, and remove "options
EARLY_AP_STARTUP".

This feature has a fairly large impact on the bootup process and has
had a few problems that manifested as hangs during boot. There was at
least one other case where an innocuous change to the kernel
configuration "fixed" the problem by introducing some second-order
effect (causing kernel threads to be scheduled in a different
order, for instance).

Regardless of whether the suggestion above makes a difference, it would
be helpful to see verbose dmesgs from both a clean boot and a boot that
hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some
assertions that will cause the system to panic when the hang occurs,
making it easier to see what's going on.

> 
> To re-assert: at the time of a hang the CPU fan starts revving up,
> and the USB keyboard is unresponsive ( does not enter scroll
> mode, caps lock and num lock do not toggle their LED indicators,
> alt ctrl esc do not activate kernel debugger. Loader "Safe mode"
> avoids the problem (presumably by disabling SMP).
> 
> Meanwhile I have successfully upgraded two other similar
> hosts from 11.0 to 11.1-RC3, no surprises there (but they do not
> have the same disk controller).
> 
> Not sure what to try next.
> 
>Mark
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-17 Thread Mark Johnston
On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote:
> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update 
> upgrade
> method I ended up with a system which gets stuck while trying to attach
> the second set of disks. This happened already after the first phase of
> the upgrade procedure (installing and re-booting with a new kernel).
> 
> The first set of disks (ada0 .. ada2) are attached successfully, also a
> cd0, but then when the first of the set of four (a regular spinning 
> disk)
> on an LSI controller is to be attached, the boot procedure just gets
> stuck there:
> 
>kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
>kernel: ada1: Command Queueing enabled
>kernel: ada1: 305245MB (625142448 512 byte sectors)
>kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0
>kernel: ada2:  ATA8-ACS SATA 3.x device
>kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8
>kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
>kernel: ada2: Command Queueing enabled
>kernel: ada2: 114473MB (234441648 512 byte sectors)
>kernel: ada2: quirks=0x1<4K>
>kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0
> 
> (stuck here, keyboard not responding, fans rising their pitch,
>   presumably CPU is spinning)

Are you able to break into the debugger at this point? Try setting
debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at
the loader prompt, and hit the break key, or the key sequence
 ~ ctrl-b once the hang occurs. At the debugger prompt, try
"bt" and "show allpcpu" to start.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [SOLVED] Re: Panic with FreeBSD 11.0-RC2 VM-IMAGE when starting vboxservice

2017-07-13 Thread Mark Johnston
On Thu, Jul 13, 2017 at 01:06:30PM +, Glen Barber wrote:
> On Wed, Jul 12, 2017 at 11:29:05PM -0700, jungle boogie wrote:
> > On 07/10/2017 10:48 PM, Konstantin Belousov wrote:
> > > On Mon, Jul 10, 2017 at 01:00:05PM -0700, Mark Johnston wrote:
> > > > I suspect that this is a result of r320763. That change removed a field
> > > > from struct vm_map_entry, which is embedded in struct vm_map. Virtualbox
> > > > does not reference the fields of struct vm_map directly, but it does
> > > > call vm_map_pmap(), which is an inline accessor.
> > > 
> > > Thank you for noting.  I do not consider vm_map part of the guaranteed
> > > stable KBI, but in this case it is simpler to add padding than to follow
> > > principles.
> > > 
> > > Fixed in 320889, will ask re for MFS permissions in a day.
> > 
> > I think this is the commit:
> > https://svnweb.freebsd.org/base/stable/11/sys/vm/vm_map.h?view=log=320889
> > 
> > Do you think this will get picked up by/for RC3, which will begin on Friday?
> > https://www.freebsd.org/releases/11.1R/schedule.html
> > 
> 
> Yes, it will be in RC3.  It was committed to releng/11.1 as r320909.

It should be noted that anyone that recompiled virtualbox for RC2 will
need to do so again after updating to RC3. Anyone going straight from
RC1 or earlier to RC3 ought to be unaffected.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [SOLVED] Re: Panic with FreeBSD 11.0-RC2 VM-IMAGE when starting vboxservice

2017-07-10 Thread Mark Johnston
On Mon, Jul 10, 2017 at 09:47:50PM +0200, José G. Juanino wrote:
> El Monday 10 de July a las 21:24:41 CEST, José G. Juanino escribió:
> >El Sunday 09 de July a las 23:48:29 CEST, David Boyd escribió:
> >>With latest VM-IMAGE (vmdk) for 11.1-RC2 system panics.  I haven't 
> >>been able to process the panic completely, but the backtrace looks 
> >>mysteriously similar to those provided with PR219146.
> >>
> >>
> >>Initially, the VM-IMAGE booted just fine.  The VM-IMAGE is then 
> >>configured as a VirtualBox client.  The panic occurs on subsequent 
> >>reboots when attempting to start vboxservice.
> >>
> >>
> >>Additionally, during the custom configuration process, lsof hangs 
> >>for more than 2 minutes each time that it is invoked.
> >>
> >>
> >>Please let me know exactly what information is needed to correct 
> >>this problem.  I will be able to provide that information no later 
> >>than Tuesday morning.
> >>
> >
> >Hi, I am also suffering this panic.
> >
> >I use everyday my Windows 7 host Virtualbox with a FreeBSD guest. I 
> >upgraded from 11.0 -> 11.1-BETA1 -> 11.1-BETA2 -> 11.1-RC1 with no 
> >issues.
> >
> >But after to upgrade to 11.1-RC2, the system panics just after to 
> >start vboxservice. I have disabled the service at boot in 
> >/etc/rc.conf, but it does not help: the system crashes just after 
> >"service vboxservice start" command.
> >
> >You can get a full crash dump here:
> >
> >https://pastebin.com/MkxS9GZn
> >
> >At this moment, I am going to rebuild the virtualbox-ose-additions 
> >port (running 11.1-RC2) and I will let you know if there is some 
> >difference.
> 
> I confirm that after rebuild virtualbox-ose-additions port, the panic 
> goes away. Regards.

I suspect that this is a result of r320763. That change removed a field
from struct vm_map_entry, which is embedded in struct vm_map. Virtualbox
does not reference the fields of struct vm_map directly, but it does
call vm_map_pmap(), which is an inline accessor.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 debugging kernel unable to produce crashdump

2017-01-14 Thread Mark Johnston
On Sat, Jan 14, 2017 at 06:40:02PM +0700, Eugene Grosbein wrote:
> 
> > I suspect that this is because we only stop the scheduler upon a panic
> > if SMP is configured. Can you retest with the patch below applied?
> > 
> > Index: sys/kern/kern_shutdown.c
> > ===
> > --- sys/kern/kern_shutdown.c(revision 312082)
> > +++ sys/kern/kern_shutdown.c(working copy)
> > @@ -713,6 +713,7 @@
> > CPU_CLR(PCPU_GET(cpuid), _cpus);
> > stop_cpus_hard(other_cpus);
> > }
> > +#endif
> >  
> > /*
> >  * Ensure that the scheduler is stopped while panicking, even if panic
> > @@ -719,7 +720,6 @@
> >  * has been entered from kdb.
> >  */
> > td->td_stopsched = 1;
> > -#endif
> >  
> > bootopt = RB_AUTOBOOT;
> > newpanic = 0;
> > 
> > 
> 
> Indeed, my router is uniprocessor system and your patch really solves the 
> problem.
> Now kernel generates crashdump just fine in case of panic. Please commit the 
> fix, thanks!

Thanks, committed as r312199.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 debugging kernel unable to produce crashdump

2017-01-13 Thread Mark Johnston
On Sat, Jan 14, 2017 at 02:21:23AM +0700, Eugene Grosbein wrote:
> Hi!
> 
> I'm struggling to debug a panic in 11.0-STABLE/i386 that successfully 
> produces crashdump
> but I want more information. So I've rebuilt my custom kernel to include
> options INVARIANTS, WITNESS and DEADLKRES. Now any panic results in quick 
> unclean reboot
> without crashdump generation. Serial console shows:
> 
> Script started on Sat Jan 14 02:03:16 2017
> Command: cu -l cuau0 -s 115200
> Connected
> 
> root@gw:~ # sysctl debug.kdb.panic=1
> debug.kdb.panic:panic: kdb_sysctl_panic
> KDB: stack backtrace:
> db_trace_self_wrapper(e8bc2ae8,0,e8bc2ae8,e8bc2a48,c06b4af4,...) at 
> 0xc04c457b = db_trace_self_wrapper+0x2b/frame 0xe8bc2a20
> vpanic(c0a4916e,e8bc2a54,e8bc2a54,e8bc2a5c,c06e881f,...) at 0xc06b4a7f = 
> vpanic+0x6f/frame 0xe8bc2a34
> panic(c0a4916e,1,c0ae7e48,e8bc2a88,c06bfdd3,...) at 0xc06b4af4 = 
> panic+0x14/frame 0xe8bc2a48
> kdb_sysctl_panic(c0ae7e48,0,0,0,e8bc2ae8) at 0xc06e881f = 
> kdb_sysctl_panic+0x4f/frame 0xe8bc2a5c
> sysctl_root_handler_locked(0,0,e8bc2ae8,e8bc2aa8) at 0xc06bfdd3 = 
> sysctl_root_handler_locked+0x83/frame 0xe8bc2a88
> sysctl_root(0,e8bc2ae8) at 0xc06bf744 = sysctl_root+0x144/frame 0xe8bc2ad8
> userland_sysctl(c759f9c0,e8bc2b60,3,0,0,0,bfbfdc5c,4,e8bc2bc0,0) at 
> 0xc06bfb9d = userland_sysctl+0x12d/frame 0xe8bc2b30
> sys___sysctl(c759f9c0,e8bc2c00) at 0xc06bfa32 = sys___sysctl+0x52/frame 
> 0xe8bc2bd0
> syscall(e8bc2ce8) at 0xc0980801 = syscall+0x2a1/frame 0xe8bc2cdc
> Xint0x80_syscall() at 0xc096e45e = Xint0x80_syscall+0x2e/frame 0xe8bc2cdc
> --- syscall (202, FreeBSD ELF32, sys___sysctl), eip = 0x2818541b, esp = 
> 0xbfbfdbc8, ebp = 0xbfbfdbf0 ---
> Uptime: 4m36s
> panic: malloc: called with spinlock or critical section held
> Uptime: 4m36s
> panic: _mtx_lock_sleep: recursed on non-recursive mutex CAM device lock @ 
> /home/src/sys/cam/ata/ata_da.c:3382

I suspect that this is because we only stop the scheduler upon a panic
if SMP is configured. Can you retest with the patch below applied?

Index: sys/kern/kern_shutdown.c
===
--- sys/kern/kern_shutdown.c(revision 312082)
+++ sys/kern/kern_shutdown.c(working copy)
@@ -713,6 +713,7 @@
CPU_CLR(PCPU_GET(cpuid), _cpus);
stop_cpus_hard(other_cpus);
}
+#endif
 
/*
 * Ensure that the scheduler is stopped while panicking, even if panic
@@ -719,7 +720,6 @@
 * has been entered from kdb.
 */
td->td_stopsched = 1;
-#endif
 
bootopt = RB_AUTOBOOT;
newpanic = 0;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Strange dtrace warning on running svn, perl and other programs on stable/10 r310494

2016-12-29 Thread Mark Johnston
On Thu, Dec 29, 2016 at 12:04:56PM +0100, Trond Endrestøl wrote:
> I keep getting these warnings whenever I run svn, perl, and other 
> programs.
> 
> WARNING: number of probes fixed does not match the number of defined probes 
> (16 != 18, respectively)
> WARNING: some probes might not fire or your program might crash
> 
> They also clobber the building of security/vpnc. Some "clever" Perl 
> script is being used to produce a .c file on stdout and a .h file on 
> stderr while building vpnc.

This is emitted by some code that registers userland DTrace probes
during process init. The mechanism used in stable/10 doesn't work in
some cases. This has been fixed in 11, but the change cannot easily be
merged back.

> 
> Any chance of getting rid of the messages, or at least disabling them?

If you're not planning on using the DTrace probes, the message can be
suppressed by setting DTRACE_DOF_INIT_DISABLE in the environment.

> 
> This is on stable/10, r310494.
> 
> /etc/make.conf contains:
> 
> STRIP=
> CFLAGS+=-fno-omit-frame-pointer
> WITH_CTF=1
> WITH_SSP_PORTS=yes
> 
> This particular system will soon be upgraded to stable/11, r310770. 
> Maybe my troubles will disappear once the transition is complete.

Indeed, this message is gone on stable/11.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c

2016-08-09 Thread Mark Johnston
On Mon, Aug 08, 2016 at 10:02:00AM +0200, Harry Schmalzbauer wrote:
> Bezüglich Rick Macklem's Nachricht vom 07.08.2016 23:34 (localtime):
> > Harry Schmalzbauer wrote:
> >> Hello,
> >>
> >> I had another crash which I'm quite sure was triggered by mount_unionfs:
> > Just in case you are not already aware, unionfs is always broken. Read
> > the BUGS
> > section at the end of "man mount_unionfs". If it were easy to fix,
> > someone would
> > have done so long ago. Yes, some use it successfully, but if not...
> > 
> > Sorry, but I suspect that is how it will remain, rick
> 
> Thanks for the hint, not happy to hear that, but I was not aware of that
> explicit warning in man 8 mount_unionfs :-(
> 
> This feature is utterly important for me (all my productive machines
> have "/" read-only mounted and "/etc" is an union to a writable, synch
> mounted separate fs), so back in 2012, after a lot of locking redesign
> has been done in 9-current, I got Attilio Raos attention and he gave out
> some test patches for 9.0.
> He was aware of missing locking adjustments, but patches addressing the
> majority of them didn't work.
> Since then I'm draging a minimal patch which prevents at least the
> kernel panics for me.
> Unfortunately I don't have the skills to continue Attilio Raos work.
> 
> Just for anybody else needing unionfs:
> https://people.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch
> 
> This patch still applies and I'm successfully using this (unmodified) up
> to FreeBSD-10.3 and never had any panic in all these years.

Having spent some time looking at unionfs, I'm a bit skeptical that this
patch will address the panic you reported earlier, though I'd be
interested to know if it does. Reading the code, I think it will just
address an INVARIANTS-only assertion in insmntque1().

Unfortunately, unionfs is quite difficult to fix within the current
constraints of FreeBSD's VFS. unionfs_readdir() is a particularly good
demonstration of this fact: some callers of VOP_READDIR expect the
cookies returned by the FS to be monotonically increasing, but unionfs
has no straightforward way to make this guarantee.

> 
> I will continue using it for FreeBSD-11 and I guess it will also prevent
> my last reported panics.
> But I wanted to take part in the BETA test without local modifications
> at first.
> 
> Another very importend usage scenario of unionfs for me is for my build
> host(s). I'm (nfs4-)sharing a svn-checked out read-only portstree. My
> inofficial "ports/inofficial" directory perfectly shows up by
> unionfs-mounting it below the unaltered portstree :-)
> 
> For me, unionfs is as important as ZFS (and nullfs) is in FreeBSD.
> 
> First thing to do for me, after I won in lottery, was to find someone
> who can be sponsored fixing unionfs ;-) And bringing MNAMELEN into 21st
> century state, matching ZFS needs:
> https://lists.freebsd.org/pipermail/freebsd-hackers/2015-November/048640.html
> This is another patch I'm carrying for a very long time which solves
> tremendous limitations for me. Without that, I couldn't use ZFS
> snapshots in real world, along with a human-friendly dataset naming :-)
> 
> -Harry
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Nasty state after running out of swap

2016-06-08 Thread Mark Johnston
On Wed, Jun 08, 2016 at 05:02:09PM -0400, Mikhail T. wrote:
> In my absence my desktop managed to run out of memory and had to kill a 
> number of processes:
> 
> pid 47493 (firefox), uid 105, was killed: out of swap space
> pid 1665 (thunderbird), uid 105, was killed: out of swap space
> pid 975 (kdeinit4), uid 105, was killed: out of swap space
> pid 1344 (mysqld), uid 105, was killed: out of swap space
> pid 898 (Xorg), uid 0, was killed: out of swap space
> pid 1430 (pidgin), uid 105, was killed: out of swap space
> 
> While that's unfortunate in its own right, the current state of the 
> machine is just weird... After the massacre the swap-usage is down to 5% 
> and memory is plentiful. top(1) reports:
> 
> last pid: 85719;  load averages:  0.17,  0.15, 0.11 up
> 25+21:17:34  16:50:27
> 123 processes: 1 running, 102 sleeping, 11 stopped, 8 zombie, 1 waiting
> CPU:  0.1% user,  0.0% nice,  1.8% system,  0.3% interrupt, 97.7% idle
> Mem: 17M Active, 7276K Inact, 9876M Wired, 10M Cache, 1032M Buf, 28M
> Free

This looks like a memory leak in the kernel. What revision are you
running? Can you provide the output of "vmstat -m" and vmstat -z"?

> ARC: 1114M Total, 264M MFU, 282M MRU, 69K Anon, 44M Header, 524M Other
> Swap: 12G Total, 616M Used, 11G Free, 5% Inuse
> 
> And yet, various commands hang for a while in either pfault or zombie 
> state upon completion. For example, top, when I tried to exit it, hung 
> for about a minute with Ctrl-T reporting:
> 
> load: 0.13  cmd: top 85718 [pfault] 19.04r 0.00u 0.01s 0% 2532k
> 
> Why would a machine with so much free memory continue to act this way? 

The output you pasted shows that there is very little free memory.

> Is it yet to recover from the "out of swap" situation? I'm sure, a 
> reboot will fix everything, but I expected FreeBSD to be better than 
> that... Running 10.3-stable from April 18 here. Thanks!

There was a memory leak in CAM at that point. It's fixed in r299531, but
the vmstat output is needed to verify that this is the problem you're
hitting.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: sparc64 on stable/10 panics with r299673

2016-05-13 Thread Mark Johnston
On Fri, May 13, 2016 at 04:14:34PM -0400, Kurt Lidl wrote:
> I updated one of my sparc64 machines (a V240, dual processor, 8GB
> of memory) and now it panics.
> 
> Before (working without issue):
> FreeBSD spork.pix.net 10.3-STABLE FreeBSD 10.3-STABLE #18 r299561: Thu 
> May 12 16:28:16 EDT 2016 
> l...@spork.pix.net:/usr/obj/usr/src/sys/GENERIC  sparc64

Hi,

Could you give r299695 a try? It fixes a bug that looks related to what
you're hitting.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: dtrace on RELENG9 possible ?

2015-03-19 Thread Mark Johnston
On Thu, Mar 19, 2015 at 03:35:36PM -0400, Mike Tancsa wrote:
 Anyone know how to get dtrace working on RELENG9 ? When I go to load the 
 klds, I get the error
 
 # kldload dtraceall
 kldload: an error occurred while loading the module. Please check 
 dmesg(8) for more details.
 
 # dmesg | tail -5
 linker_load_file: Unsupported file type
 KLD profile.ko: depends on cyclic - not available or version mismatch
 linker_load_file: Unsupported file type
 KLD dtraceall.ko: depends on profile - not available or version mismatch
 linker_load_file: Unsupported file type
 
 if I try and load profile, or cyclic, I get
 
 KLD profile.ko: depends on cyclic - not available or version mismatch
 linker_load_file: Unsupported file type
 link_elf_obj: symbol cyclic_clock_func undefined
 linker_load_file: Unsupported file type
 link_elf_obj: symbol cyclic_clock_func undefined
 linker_load_file: Unsupported file type
 
 Googling around this seems to be a known problem going back some time 
 and there are various patches posted, but I am not sure what the best 
 way to proceed is ?  This is only for a test box so I can try and better 
 understand why RELENG9 is so much faster than RELENG10 for my particular 
 applications.

Could you point me to one of these threads? There are several that refer
to cyclic_clock_func, but they have to do with build failures, which
isn't what you're seeing.

 This is releng9 from today after a fresh buildworld/kernel

I'm not quite sure what you mean by releng9. Is it 9.0? 9.3? Does your
kernel configuration file contain options KDTRACE_HOOKS?

Thanks,
-Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Suspected libkvm infinite loop

2015-03-12 Thread Mark Johnston
On Wed, Mar 11, 2015 at 9:05 PM, Nick Frampton nick.framp...@akips.com wrote:
 On 12/03/15 00:38, John Baldwin wrote:

 It sounds like this issue might be the one fixed in r272566: if the
  KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
  sbuf error return value could bubble up and be treated as ERESTART,
  resulting in a loop.
  
  This can be confirmed with something like
  
  dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] =
   count();} tick-3s {exit(0);}' -p pid of looping proc
  
  If the output consists solely of __sysctl, this bug is likely the
  culprit.

 
 Unfortunately, I accidentally killed fstat this morning before I could
  do any further debug.
 
 I ran truss -p on it yesterday and it was spinning solely on __sysctl.
 
 I'll try compiling with debug symbols in case it happens again. I
  haven't been able to reproduce the
 problem in a reasonable time frame so it could be days or weeks before
  we see it happen again.

 Tha truss output is consistent with Mark's suggestion, so I would try
 his suggested fix of 272566.


 I patched the 10.1 kernel with r272566 and it appears to have fixed the
 issue. Is this patch likely to be MFCed back to 10-stable?

A followup to the thread: the change has been merged to stable/10 in r279926.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Suspected libkvm infinite loop

2015-03-11 Thread Mark Johnston
On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote:
 On 12/03/15 00:38, John Baldwin wrote:
  It sounds like this issue might be the one fixed in r272566: if the
   KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
   sbuf error return value could bubble up and be treated as ERESTART,
   resulting in a loop.
   
   This can be confirmed with something like
   
   dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = 
count();} tick-3s {exit(0);}' -p pid of looping proc
   
   If the output consists solely of __sysctl, this bug is likely the
   culprit.
  
  Unfortunately, I accidentally killed fstat this morning before I could do 
  any further debug.
  
  I ran truss -p on it yesterday and it was spinning solely on __sysctl.
  
  I'll try compiling with debug symbols in case it happens again. I haven't 
  been able to reproduce the
  problem in a reasonable time frame so it could be days or weeks before we 
  see it happen again.
  Tha truss output is consistent with Mark's suggestion, so I would try
  his suggested fix of 272566.
 
 I patched the 10.1 kernel with r272566 and it appears to have fixed the 
 issue. Is this patch likely 
 to be MFCed back to 10-stable?

I can't see any reason it shouldn't be, and there was an MFC reminder in
the commit log entry for that revision. I've cc'ed kib@, who might have a
reason.

 
 Our RC script forks off about 200 processes when starting our software, and I 
 wrote a small script 
 to repeatedly stop/start the software, which fairly reliably reproduces the 
 issue about 1 in 10 
 times. I've been running the script with the patched kernel for an hour now 
 and I haven't seen the 
 issue appear.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Suspected libkvm infinite loop

2015-03-10 Thread Mark Johnston
On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote:
 On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote:
  Hi,
  
  For the past several months, we have had an intermittent problem where a
  process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets
  stuck in an infinite loop and goes to 100% cpu. We have just observed
  fstat -m do the same thing and suspect it may be the same problem.
  
  Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with
  ufs root and zfs /home.
  
  Has anyone else experienced this? Is there anything we can do to investigate
  the problem further?
 
 Often loops using libkvm are due to programs using libkvm are trying to read 
 kernel data structures while they are changing.  However, if you use sysctls 
 to fetch this data instead, you should be able to get a stable snapshot of 
 the 
 system state without getting stuck in a possible loop.  I believe for libkvm 
 to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and 
 /dev/null for the core image.  fstat -m should be doing that by default 
 however, so if it is not that, can you ktrace fstat when it is spinning to 
 see 
 if it is spinning userland or in the kernel?  If you see no activity via 
 ktrace, then it is spinning in one of the two places without making any 
 system 
 calls, etc.  You can attach to it with gdb to pause it, then see where gdb 
 thinks it is.  If gdb hangs attaching to it, then it is stuck in the kernel.  
 
 If gdb attaches to it ok, then it is spinning in userland.  Unfortunately, 
 for 
 gdb to be useful, you really need debug symbols.  We don't currently provide 
 those for release binaries or binaries provided via freebsd-update (though 
 that is being worked on for 11.0).  If you build from source, then the 
 simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf 
 and 
 rebuild your world without NO_CLEAN.  If you are building from source and are 
 able to reproduce with those binaries, then after attaching to the process 
 with gdb, use 'bt' to see where it is hung and reply with that.
 
 If it is hanging in the kernel, then you will need to use the kernel debugger 
 to see where it is hanging.  The simplest way to do this is probably to force 
 a crash via the debug.kdb.panic sysctl (set it to a non-zero value).  You 
 will 
 then need to fire up kgdb on the crash dump after it reboots, switch to the 
 fstat process via the 'proc pid' command and get a backtrace via 'bt'.

It sounds like this issue might be the one fixed in r272566: if the
KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
sbuf error return value could bubble up and be treated as ERESTART,
resulting in a loop.

This can be confirmed with something like

  dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s 
{exit(0);}' -p pid of looping proc

If the output consists solely of __sysctl, this bug is likely the
culprit.

-Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


syncer causing latency spikes

2013-07-17 Thread Mark Johnston
Hello,

I'm trying to investigate and solve some postgres latency spikes that
I'm seeing as a result of some behaviour in the syncer. This is with
FreeBSD 8.2 (with some local modifications and backports, r231160 in
particular). The system has an LSI 9261-8i RAID controller (backed by
mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and
a RAID 1 respectively. It has about 96GB of RAM installed.

What's happening is that the syncer tries to fsync a large database file
and goes to sleep in getpbuf() with the corresponding vnode lock held
and the following stack:

#3  0x805fceb5 in _sleep (ident=0x80ca8e20, 
lock=0x80d6bc20, priority=-2134554464, wmesg=0x80a4fe43 
wswbuf0, timo=0) at /d2/usr/src/sys/kern/kern_synch.c:234
#4  0x808780d5 in getpbuf (pfreecnt=0x80ca8e20) at 
/d2/usr/src/sys/vm/vm_pager.c:339
#5  0x80677a00 in cluster_wbuild (vp=0xff02ea3d7ce8, size=16384, 
start_lbn=20869, len=2) at /d2/usr/src/sys/kern/vfs_cluster.c:801
#6  0x808477ed in ffs_syncvnode (vp=0xff02ea3d7ce8, 
waitfor=Variable waitfor is not available.) at 
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
#7  0x808488cf in ffs_fsync (ap=0xff9b0cd27b00) at 
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:190
#8  0x8096798a in VOP_FSYNC_APV (vop=0x80ca5300, 
a=0xff9b0cd27b00) at vnode_if.c:1267
#9  0x8068bade in sync_vnode (slp=0xff002ab8e758, 
bo=0xff9b0cd27bc0, td=0xff002ac89460) at vnode_if.h:549
#10 0x8068bdcd in sched_sync () at /d2/usr/src/sys/kern/vfs_subr.c:1841

(kgdb) frame 6
#6  0x808477ed in ffs_syncvnode (vp=0xff02ea3d7ce8, 
waitfor=Variable waitfor is not available.) at 
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
306 vfs_bio_awrite(bp);
(kgdb) vpath vp
0xff02ea3d7ce8: 18381
0xff02eab1dce8: 16384
0xff02eaaf0588: base
0xff01616d8b10: data
0xff01616d8ce8: pgsql
0xff002af9f588: mount point
0xff002af853b0: d3
0xff002abd6b10: /
(kgdb)

During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
happening up to 8 or 10 times a second. When this happens, a bunch of
postgres threads become blocked in vn_write() waiting for the vnode lock
to become free. It looks like the write-clustering code is limited to
using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
to anything greater than 256.

Since the sleeps are happening in the write-clustering code, I tried
disabling write clustering on the mountpoint (with the noclusterw mount
option) and found that this replaces my problem with another one: the
syncer periodically generates large bursts of writes that create a
backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to
return, causing more or less the same end result.

Does anyone have any suggestions on what I can do to help reduce the
impact of the syncer on my systems? I can't just move to a newer version
of FreeBSD, but I'm willing to backport changes if anyone can point me
to something that might help.

Thanks,
-Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: syncer causing latency spikes

2013-07-17 Thread Mark Johnston
On Wed, Jul 17, 2013 at 10:18:52PM +0300, Konstantin Belousov wrote:
 On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote:
  Hello,
  
  I'm trying to investigate and solve some postgres latency spikes that
  I'm seeing as a result of some behaviour in the syncer. This is with
  FreeBSD 8.2 (with some local modifications and backports, r231160 in
  particular). The system has an LSI 9261-8i RAID controller (backed by
  mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and
  a RAID 1 respectively. It has about 96GB of RAM installed.
  
  What's happening is that the syncer tries to fsync a large database file
  and goes to sleep in getpbuf() with the corresponding vnode lock held
  and the following stack:
  
  #3  0x805fceb5 in _sleep (ident=0x80ca8e20, 
  lock=0x80d6bc20, priority=-2134554464, wmesg=0x80a4fe43 
  wswbuf0, timo=0) at /d2/usr/src/sys/kern/kern_synch.c:234
  #4  0x808780d5 in getpbuf (pfreecnt=0x80ca8e20) at 
  /d2/usr/src/sys/vm/vm_pager.c:339
  #5  0x80677a00 in cluster_wbuild (vp=0xff02ea3d7ce8, 
  size=16384, start_lbn=20869, len=2) at 
  /d2/usr/src/sys/kern/vfs_cluster.c:801
  #6  0x808477ed in ffs_syncvnode (vp=0xff02ea3d7ce8, 
  waitfor=Variable waitfor is not available.) at 
  /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
  #7  0x808488cf in ffs_fsync (ap=0xff9b0cd27b00) at 
  /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:190
  #8  0x8096798a in VOP_FSYNC_APV (vop=0x80ca5300, 
  a=0xff9b0cd27b00) at vnode_if.c:1267
  #9  0x8068bade in sync_vnode (slp=0xff002ab8e758, 
  bo=0xff9b0cd27bc0, td=0xff002ac89460) at vnode_if.h:549
  #10 0x8068bdcd in sched_sync () at 
  /d2/usr/src/sys/kern/vfs_subr.c:1841
  
  (kgdb) frame 6
  #6  0x808477ed in ffs_syncvnode (vp=0xff02ea3d7ce8, 
  waitfor=Variable waitfor is not available.) at 
  /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
  306 vfs_bio_awrite(bp);
  (kgdb) vpath vp
  0xff02ea3d7ce8: 18381
  0xff02eab1dce8: 16384
  0xff02eaaf0588: base
  0xff01616d8b10: data
  0xff01616d8ce8: pgsql
  0xff002af9f588: mount point
  0xff002af853b0: d3
  0xff002abd6b10: /
  (kgdb)
  
  During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
  happening up to 8 or 10 times a second. When this happens, a bunch of
  postgres threads become blocked in vn_write() waiting for the vnode lock
  to become free. It looks like the write-clustering code is limited to
  using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
  to anything greater than 256.
 Syncer is probably just a victim of profiling.  Would postgres called
 fsync(2), you then blame the fsync code for the pauses.

True, and postgres does frequently fsync(2) its own database files as
part its checkpointing operations. However, every time I take a look at
what's happening during an I/O spike, it's the syncer that's holding the
lock. I'm digging into the postgres internals right now, but I think that
it mangages to avoid having a backend writing to a database file at the
same time that the bg writer is fsync(2)ing it. postgres takes care to
checkpoint its files when it wants to ensure that data has been written
out, so it seems to me that the syncer's behaviour is at least somewhat
redundant here.

 
 Just add a tunable to allow the user to manually-tune the nswbuf,
 regardless of the buffer cache sizing.  And yes, nswbuf default max
 probably should be bumped to something like 1024, at least on 64bit
 architectures which do not starve for kernel memory.

I ended trying that right after I wrote the first email. I goosed up
nswbuf to 8192 and found that it just exacerbated the problem - the
syncer still ends up going to sleep waiting for pbufs, but it generates
a lot more I/O than before and mfi(4)'s bioq becomes huge; I saw it go
up to 2 regularly, which takes a while to drain.

 
  
  Since the sleeps are happening in the write-clustering code, I tried
  disabling write clustering on the mountpoint (with the noclusterw mount
  option) and found that this replaces my problem with another one: the
  syncer periodically generates large bursts of writes that create a
  backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to
  return, causing more or less the same end result.
 Well, this is exactly what clustering code is for, merge a lot of
 short i/o requests into the bigger i/o ?

Right - it's what I expected. I just wanted to preemptively answer the
question of what happens if you disable clustering?

 
  
  Does anyone have any suggestions on what I can do to help reduce the
  impact of the syncer on my systems? I can't just move to a newer version
  of FreeBSD, but I'm willing to backport changes if anyone can point me
  to something that might help.
 
 As a side note, syncer code for UFS was redone in HEAD and 9 to only
 iterate over the active

Re: syncer causing latency spikes

2013-07-17 Thread Mark Johnston
On Wed, Jul 17, 2013 at 04:15:35PM -0400, John Baldwin wrote:
 On Wednesday, July 17, 2013 3:18:52 pm Konstantin Belousov wrote:
  On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote:
   During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
   happening up to 8 or 10 times a second. When this happens, a bunch of
   postgres threads become blocked in vn_write() waiting for the vnode lock
   to become free. It looks like the write-clustering code is limited to
   using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
   to anything greater than 256.
  Syncer is probably just a victim of profiling.  Would postgres called
  fsync(2), you then blame the fsync code for the pauses.
  
  Just add a tunable to allow the user to manually-tune the nswbuf,
  regardless of the buffer cache sizing.  And yes, nswbuf default max
  probably should be bumped to something like 1024, at least on 64bit
  architectures which do not starve for kernel memory.
 
 Also, if you are seeing I/O stalls with mfi(4), then you might need a
 firmware update for your mfi(4) controller.  cc'ing smh@ who knows more about 
 that particular issue (IIRC).

I tried upgrading the firmware to the latest available image (I believe
it was from March), but that didn't help. I wouldn't call my problem a
stall in the sense of commands timing out (which I've seen before), it's
just that we manage to generate a large enough backlog that the
driver/controller take at least several seconds to clear it, during
which all I/O is stalled in the kernel.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: AHCI Patsburg SATA controller and slow transfer speed

2013-06-27 Thread Mark Johnston
On Thu, Jun 27, 2013 at 5:21 PM, Dave Hayes d...@jetcafe.org wrote:
 Greetings all. I'm on FreeBSD 9.1-STABLE #0 r251391M. I'm noticing two of my
 SATA disks are at half speed. Is this normal or is there some configuration
 I'm forgetting?

In my experience it's fairly common to have a mix of 6Gb/s and 3Gb/s ports.

In particular, see: http://en.wikipedia.org/wiki/Intel_X79#Features
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Unexpected reboot/crash on 8.2-RELEASE.

2013-05-18 Thread Mark Johnston
On Sat, May 18, 2013 at 09:45:21PM -0400, kpn...@pobox.com wrote:
 I had an unexpected reboot of my Dell R610 today around 2:05-06pm today.
 I do not know if it crashed or if it was power cycled.
 
 This machine is running:
 FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 
  8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC  amd64
 
 It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of
 vfs_mountroot() to delay before attempting to mount the root filesystem.
 (Without my tweak it attempts to mount root before the USB drive is finished
 getting attached.)
 
 The dmesg shows this at the reboot:
 mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete
 mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started 
 mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete
 mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 
 0060/1000/1f0c/1028)
 mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
 mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 
 0060/1000/1f0c/1028)
 mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
 
 Does this mean the machine did not lose power? I ask because my datacenter
 had some sort of power incident and I'm not sure if the server lost power
 or not. But if the kernel message buffer from before the incident is still
 present then the machine never lost power, correct? The datacenter's power
 incident I'm told happened somewhere around the time of the reboot so I
 have to ask.

The LSI controllers I've used will keep internal event logs which are
persistent across power cycles (so long as the BBU isn't dead,
presumably). It looks like mfi(4) has been set up to dump the entire
event log during boot. Log entries created after the last reboot are
displayed with a timestamp of boot + Ns.

 
 It looks like I didn't have dumps enabled. That's ... not helpful.
 
 The machine has been stable for:
  2:05PM  up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00

That's a bit confusing... did you mean had been? This is the exact
uptime that's in status.txt below.

 
 http://www.neutralgood.org/~kpn/dmesg.boot
 
 Here's various stats I usually keep displayed. This is the last from
 before the reboot:
 http://www.neutralgood.org/~kpn/status.txt
 
 I've got all the power savings features turned off in the BIOS and, like
 I said, the machine has been stable for all this time. However, one thing
 to note from a couple of days ago:
 

This is probably unrelated? As an aside, it'd be nice if mfi(4) dumped
info about the dcmd/io cmd at least once if it times out. At the moment,
it only does that if MFI_DEBUG is enabled... does anyone have an
objection to changing this from a compile-time option to a sysctl?

Thanks,
-Mark

 May 14 00:49:13 gunsight1 -- MARK --
 May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 35 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 65 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 95 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 125 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 155 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 185 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 215 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 245 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 275 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 305 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 335 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 365 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 395 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 425 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 455 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 485 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 515 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 545 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 575 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 605 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 635 SECONDS
 May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
 AFTER 665 SECONDS
 May 14 

RE: Freebsd 4.7.2 DHCP Spamming

2003-01-16 Thread Mark Johnston
[EMAIL PROTECTED] wrote:
 Has anyone heard of an issue where a freebsd box can rack up 
 multiple ips over the course
 of ~2 days?  There should only be 1 ip address allocated to my box.
 
 For some reason on Dec 2nd, Dec 30th, and Jan 14th my box 
 decided to keep requesting IPs, thus
 racking up ~100 before they shut me off each time.  Why would 
 they keep permitting ip requests
 above the 2 allowed ips?  

DOCSIS modems (at least the older ones that I'm familiar with) can be
configured to limit the total number of MAC addresses or of IPs.
Perhaps they're limiting MACs and you're getting a bunch of leases
assigned to the same MAC ID.  Another possibility is that their IP cap
may only limit the number of IPs you can use, not the number you can
request.  If you are using xDSL, I'm not familiar with the modems
involved, but the filters are probably similar.

 I'm running a GENERIC kernel, all source updated and 
 installed from cvsup3.freebsd.org.  Only ssh
 listening.
 
 They say that, either I'm doing it on purpose, I'm exploited, 
 or there's a problem with the dhclient.

You could also be having a packet filtering problem.  When dhclient
tries to get an IP and has none, it uses a broadcast request from
0.0.0.0 (aka DHCPDISCOVER.)  The server will respond with a broadcast
(a DHCPOFFER) to offer you the IP, then you will request it (with a
DHCPREQUEST) and the server will acknowledge you (by sending a
DHCPACK.)  All of this is carried out in broadcast packets.  When it
comes time to renew, you will send a unicast request to the server and
it will respond in kind.  If this unicast can't make it through (due
to packet filtering), you will only be able to get an IP when your
lease has expired, not renew an existing one.  Strange of their server
to give you a different one each time though.

Here's a remote possibility: Are you using any kind of automatic ipfw
or ipf tie-in IDS?  Sometimes ISPs will do foolish things, like
performing diagnostic work from an important server.  If that sets off
an alarm and you block it, so much for DHCP renewals.  If someone who
thinks they're funny decides to spoof you a packet purporting to be
from the DHCP server, and it upsets your IDS, you'll be in the same
boat.

 I was monitoring the box using tcpdump + dhcpdump to watch 
 the requests.  Unfortunately I rebooted after about
 5 days (Jan 7th ish).  I thought the problem was resolved.  I 
 asked them for logs but they can't provide any.

Having tcpdump output to a file with something like udp port 67 or
udp port 68 would provide the most detailed logs from your end,
although checking what dhclient has logged to syslog would help too.

 Could they changed something near the end of November, or the 
 start of December as this problem has
 not happened *ever* in 6 years before this. 
 
 *** Somehow I'm supposed to solve this problem without logs.  
 Hopefully someone has run into this
 problem in the past and knows a solution.  It's to never 
 happen again or 
 they will cancel my account.  

At this point, you are better safe than sorry.  Buy a cheap Linksys
broadband router, put it in between the modem and your PC, and
troubleshoot your original issue at your leisure.  It will protect you
from your ISP's wrath until you have found the cause of the problem.

Mark

note - I am stuck with Outlook at work.  Apologies if it destroys the
formatting of this message.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message