md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE]

2006-06-30 Thread Kostik Belousov
On Thu, Jun 29, 2006 at 10:48:06PM -0400, Mike Jakubik wrote:
 Konstantin Belousov wrote:
 On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote:
   
 Scott Long wrote:
 
 Dmitriy Kirhlarov wrote:
 
   
 Hi!
 
 On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:
 
 
 
 6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
 
 If you use snapshots with your quotas, update to 6.1-STABLE.  If you
   
 Sorry, guys. You are mean RELENG_6_1 or RELENG_6?
 
 WBR
 
 RELENG_6.  However, the changes will likely make their way into 
 RELENG_6_1 in a few weeks as part of an errata update.
 
 Scott
   
 I have just done tests on 6.1-R and RELENG_6 as of yesterday evening. 
 Unfortunately both still lock up hard, no crash, just a frozen system. I 
 cant enter the KDB (ddb) via the console, but its unusable, as it wont 
 let me type in anything. There must be some other change in -CURRENT 
 that fixes this, as -CURRENT did not freeze during my previous tests.
 
 
 Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system:
 
 /usr/src/sys/ufs/ufs/ufs_quota.c:
 $FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14 
 00:23:27 tegge Exp $
 
 The hangs are mostly related to snapshots. It would be better to
 update to the latest RELENG_6.
 
 Hangs on RELENG_6_1 is not so much interesting. For
 hanged RELENG_6 system, please do what described below and post
 the log of the ddb session.
 
 I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes).
 If you have it in your kernel, add the line
 hint.kbdmux.0.disabled=1
 into the /boot/device.hints to make ddb usable.
 
 After that, on the hang, enter ddb, and
 do ps and tr pid for all suspected processes.
 Better yet, add the following options to your kernel:
 
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_LOCKS
 options DEBUG_VFS_LOCKS
 options DIAGNOSTIC
 
 and, after hang, do in ddb
 
 show allpcpu
 show alllocks
 show lockedvnods
 ps
 
 For each process mentioned in show output, do where pid
 (for threaded processes, do thread thread-id; where).
 
 BTW, it would be great to add this instructions to the FAQ.
   
 
 Well, i finally got around to setting up a serial console on this box, 
 the following is the output from the debugger after the system stopped 
 responding. Let me know if you need any more/different information, i 
 also made the kernel changes you recommended.
 
 FreeBSD 6.1-STABLE #1: Thu Jun 10 00:22:29 EDT 2006
 
 ---
 KDB: enter: Line break on console
 [thread pid 12 tid 14 ]
 Stopped at  kdb_enter+0x30: leave  
 db ps
  pid   proc uid  ppid  pgrp  flag   stat  wmesgwchan  cmd
  552 c36228302   550   549 0004000 [SLPQ flswai 0xc0707c24][SLP] rm
  550 c35708302   549   549 0004000 [SLPQ wait 0xc3570830][SLP] sh
  549 c342ec482   548   549 0004000 [SLPQ wait 0xc342ec48][SLP] sh
  548 c36226240   422   422 000 [SLPQ piperd 0xc36027f8][SLP] cron
  547 c361f8300   524   547 0004002 [SLPQ ufs 0xc3777c94][SLP] ls
  546 c36bc4180   544   544 0004002 [SLPQ wdrain 0xc0707be4][SLP] 
 fsck_4.2bsd
  544 c36bcc480   511   544 0004002 [SLPQ wait 0xc36bcc48][SLP] fsck
  524 c35e020c0   522   524 0004002 [SLPQ wait 0xc35e020c][SLP] bash
  522 c3570c480   406   522 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
  515 c36bc20c0 0 0 204 [SLPQ wdrain 0xc0707be4][SLP] md0
  511 c36bb6240   500   511 0004002 [SLPQ wait 0xc36bb624][SLP] bash
  509 c3570418   65 1   509 100 [SLPQ select 0xc0707644][SLP] 
 dhclient
  500 c361fa3c0   406   500 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
  480 c342ea3c0 1   256 000 [SLPQ select 0xc0707644][SLP] 
 dhclient
  465 c361f6240 1   465 0004002 [SLPQ ttyin 0xc342b010][SLP] getty
  464 c35e0c480 1   464 0004002 [SLPQ ttyin 0xc3429410][SLP] getty
  463 c356fa3c0 1   463 0004002 [SLPQ ttyin 0xc3429810][SLP] getty
  462 c356f4180 1   462 0004002 [SLPQ ttyin 0xc343f010][SLP] getty
  422 c342e6240 1   422 000 [SLPQ nanslp 0xc06ba32c][SLP] cron
  416 c356f000   25 1   416 100 [SLPQ pause 0xc356f034][SLP] 
 sendmail
  412 c356f6240 1   412 100 [SLPQ select 0xc0707644][SLP] 
 sendmail
  406 c35e0 1   406 100 [SLPQ select 0xc0707644][SLP] sshd
  290 c361f20c0 1   290 000 [SLPQ flswai 0xc0707c24][SLP] 
 syslogd
  256 c36224180 1   256 000 [SLPQ select 0xc0707644][SLP] devd
  145 c356f8300 1   145 000 [SLPQ pause 0xc356f864][SLP] 
 adjkerntz
   38 c3378c480 0 0 204 [SLPQ - 0xd56f5cf8][SLP] schedcpu
   37 c342d0000 0 0 204 [SLPQ sdflush 0xc070a3b4][SLP] 
 softdepflush
   36 c342d20c0 0 0 204 [SLPQ vlruwt 0xc342d20c][SLP] vnlru
   35 c342d4180 0 0 204 [SLPQ ufs 0xc363c46c][SLP] syncer
   34 c342d624 

Re: md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE]

2006-06-30 Thread Mike Jakubik

Kostik Belousov wrote:

First, I set the followup to the right mailing list.

Second, I am really curious what you do. My understanding follows: you
have set up vnode-backed md device (md0a) on sparce file, created ufs2
on it, mounted it with quotas, and run background fsck on that fs. At
the same time, you did rm for the snapshot file created by fsck. Right ?
  


This is the procedure i followed, while i have quota enabled, it was not 
set on the test filesystem.


1) dd if=/dev/zero of=/usr/bigfile bs=1024 seek=209715200 count=0
2) mdconfig -a -t vnode -f /usr/bigfile
3) bsdlabel -w md0 auto
4) newfs -U md0a
5) fsck -v /dev/md0a # ^C this after a second or so, this makes the FS dirty
6) mount /dev/md0a /mnt
7) fsck -v -B /dev/md0a

in another window:
8) while true; do ls -al /mnt/.snap;sleep 1;done



Anyway, the problem seems to be not related to neither snapshots nor
quotas. In your trace, process 35 (syncer) tries to sync the vnode
0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That
vnode is already locked by process 515 (md0 kthread). Process 515 is
stuck in the wdrain state, waiting for buffers to be flushed. It seems
that there is huge amount of dirty buffers going to be written to md0,
caused by snapshotting the fs. As result, system deadlocks due to md0
hung waiting for buffer' runspace, that is occupied by pending write
requests to md0.

Do -fs@ readers agree with analysis ?

I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file-
backed md threads to prevent such deadlocks. That i/o is already
accounted for in the upper layer. Moreover, that already accounted
requests do not really differ from requests (re)issued by md.

Please, comment.
  


FYI, -CURRENT passes this test without locking up, so the fix is already 
there somewhere.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-06-29 Thread Mike Jakubik

Konstantin Belousov wrote:

On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote:
  

Scott Long wrote:


Dmitriy Kirhlarov wrote:

  

Hi!

On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:




6.1-STABLE after 6.1-RELEASE is releases. So I think you may want


If you use snapshots with your quotas, update to 6.1-STABLE.  If you
  

Sorry, guys. You are mean RELENG_6_1 or RELENG_6?

WBR

RELENG_6.  However, the changes will likely make their way into 
RELENG_6_1 in a few weeks as part of an errata update.


Scott
  
I have just done tests on 6.1-R and RELENG_6 as of yesterday evening. 
Unfortunately both still lock up hard, no crash, just a frozen system. I 
cant enter the KDB (ddb) via the console, but its unusable, as it wont 
let me type in anything. There must be some other change in -CURRENT 
that fixes this, as -CURRENT did not freeze during my previous tests.



Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system:

/usr/src/sys/ufs/ufs/ufs_quota.c:
$FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14 
00:23:27 tegge Exp $


The hangs are mostly related to snapshots. It would be better to
update to the latest RELENG_6.

Hangs on RELENG_6_1 is not so much interesting. For
hanged RELENG_6 system, please do what described below and post
the log of the ddb session.

I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes).
If you have it in your kernel, add the line
hint.kbdmux.0.disabled=1
into the /boot/device.hints to make ddb usable.

After that, on the hang, enter ddb, and
do ps and tr pid for all suspected processes.
Better yet, add the following options to your kernel:

options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC

and, after hang, do in ddb

show allpcpu
show alllocks
show lockedvnods
ps

For each process mentioned in show output, do where pid
(for threaded processes, do thread thread-id; where).

BTW, it would be great to add this instructions to the FAQ.
  


Well, i finally got around to setting up a serial console on this box, 
the following is the output from the debugger after the system stopped 
responding. Let me know if you need any more/different information, i 
also made the kernel changes you recommended.


FreeBSD 6.1-STABLE #1: Thu Jun 10 00:22:29 EDT 2006

---
KDB: enter: Line break on console
[thread pid 12 tid 14 ]
Stopped at  kdb_enter+0x30: leave  
db ps

 pid   proc uid  ppid  pgrp  flag   stat  wmesgwchan  cmd
 552 c36228302   550   549 0004000 [SLPQ flswai 0xc0707c24][SLP] rm
 550 c35708302   549   549 0004000 [SLPQ wait 0xc3570830][SLP] sh
 549 c342ec482   548   549 0004000 [SLPQ wait 0xc342ec48][SLP] sh
 548 c36226240   422   422 000 [SLPQ piperd 0xc36027f8][SLP] cron
 547 c361f8300   524   547 0004002 [SLPQ ufs 0xc3777c94][SLP] ls
 546 c36bc4180   544   544 0004002 [SLPQ wdrain 0xc0707be4][SLP] 
fsck_4.2bsd

 544 c36bcc480   511   544 0004002 [SLPQ wait 0xc36bcc48][SLP] fsck
 524 c35e020c0   522   524 0004002 [SLPQ wait 0xc35e020c][SLP] bash
 522 c3570c480   406   522 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
 515 c36bc20c0 0 0 204 [SLPQ wdrain 0xc0707be4][SLP] md0
 511 c36bb6240   500   511 0004002 [SLPQ wait 0xc36bb624][SLP] bash
 509 c3570418   65 1   509 100 [SLPQ select 0xc0707644][SLP] 
dhclient

 500 c361fa3c0   406   500 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
 480 c342ea3c0 1   256 000 [SLPQ select 0xc0707644][SLP] 
dhclient

 465 c361f6240 1   465 0004002 [SLPQ ttyin 0xc342b010][SLP] getty
 464 c35e0c480 1   464 0004002 [SLPQ ttyin 0xc3429410][SLP] getty
 463 c356fa3c0 1   463 0004002 [SLPQ ttyin 0xc3429810][SLP] getty
 462 c356f4180 1   462 0004002 [SLPQ ttyin 0xc343f010][SLP] getty
 422 c342e6240 1   422 000 [SLPQ nanslp 0xc06ba32c][SLP] cron
 416 c356f000   25 1   416 100 [SLPQ pause 0xc356f034][SLP] 
sendmail
 412 c356f6240 1   412 100 [SLPQ select 0xc0707644][SLP] 
sendmail

 406 c35e0 1   406 100 [SLPQ select 0xc0707644][SLP] sshd
 290 c361f20c0 1   290 000 [SLPQ flswai 0xc0707c24][SLP] 
syslogd

 256 c36224180 1   256 000 [SLPQ select 0xc0707644][SLP] devd
 145 c356f8300 1   145 000 [SLPQ pause 0xc356f864][SLP] 
adjkerntz

  38 c3378c480 0 0 204 [SLPQ - 0xd56f5cf8][SLP] schedcpu
  37 c342d0000 0 0 204 [SLPQ sdflush 0xc070a3b4][SLP] 
softdepflush

  36 c342d20c0 0 0 204 [SLPQ vlruwt 0xc342d20c][SLP] vnlru
  35 c342d4180 0 0 204 [SLPQ ufs 0xc363c46c][SLP] syncer
  34 c342d6240 0 0 204 [SLPQ wdrain 0xc0707be4][SLP] 
bufdaemon
  33 c342d8300 0 0 20c [SLPQ pgzero 0xc070b324][SLP] 
pagezero
  32 c342da3c0 

Re: quota and snapshots in 6.1-RELEASE

2006-06-29 Thread Kostik Belousov
On Thu, Jun 29, 2006 at 10:48:06PM -0400, Mike Jakubik wrote:
 Konstantin Belousov wrote:
 On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote:
   
 Scott Long wrote:
 
 Dmitriy Kirhlarov wrote:
 
   
 Hi!
 
 On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:
 
 
 
 6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
 
 If you use snapshots with your quotas, update to 6.1-STABLE.  If you
   
 Sorry, guys. You are mean RELENG_6_1 or RELENG_6?
 
 WBR
 
 RELENG_6.  However, the changes will likely make their way into 
 RELENG_6_1 in a few weeks as part of an errata update.
 
 Scott
   
 I have just done tests on 6.1-R and RELENG_6 as of yesterday evening. 
 Unfortunately both still lock up hard, no crash, just a frozen system. I 
 cant enter the KDB (ddb) via the console, but its unusable, as it wont 
 let me type in anything. There must be some other change in -CURRENT 
 that fixes this, as -CURRENT did not freeze during my previous tests.
 
 
 Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system:
 
 /usr/src/sys/ufs/ufs/ufs_quota.c:
 $FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14 
 00:23:27 tegge Exp $
 
 The hangs are mostly related to snapshots. It would be better to
 update to the latest RELENG_6.
 
 Hangs on RELENG_6_1 is not so much interesting. For
 hanged RELENG_6 system, please do what described below and post
 the log of the ddb session.
 
 I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes).
 If you have it in your kernel, add the line
 hint.kbdmux.0.disabled=1
 into the /boot/device.hints to make ddb usable.
 
 After that, on the hang, enter ddb, and
 do ps and tr pid for all suspected processes.
 Better yet, add the following options to your kernel:
 
 options INVARIANTS
 options INVARIANT_SUPPORT
 options WITNESS
 options DEBUG_LOCKS
 options DEBUG_VFS_LOCKS
 options DIAGNOSTIC
 
 and, after hang, do in ddb
 
 show allpcpu
 show alllocks
 show lockedvnods
 ps
 
 For each process mentioned in show output, do where pid
 (for threaded processes, do thread thread-id; where).
 
 BTW, it would be great to add this instructions to the FAQ.
   
 
 Well, i finally got around to setting up a serial console on this box, 
 the following is the output from the debugger after the system stopped 
 responding. Let me know if you need any more/different information, i 
 also made the kernel changes you recommended.
 
 FreeBSD 6.1-STABLE #1: Thu Jun 10 00:22:29 EDT 2006
 
 ---
 KDB: enter: Line break on console
 [thread pid 12 tid 14 ]
 Stopped at  kdb_enter+0x30: leave  
 db ps
  pid   proc uid  ppid  pgrp  flag   stat  wmesgwchan  cmd
  552 c36228302   550   549 0004000 [SLPQ flswai 0xc0707c24][SLP] rm
  550 c35708302   549   549 0004000 [SLPQ wait 0xc3570830][SLP] sh
  549 c342ec482   548   549 0004000 [SLPQ wait 0xc342ec48][SLP] sh
  548 c36226240   422   422 000 [SLPQ piperd 0xc36027f8][SLP] cron
  547 c361f8300   524   547 0004002 [SLPQ ufs 0xc3777c94][SLP] ls
  546 c36bc4180   544   544 0004002 [SLPQ wdrain 0xc0707be4][SLP] 
 fsck_4.2bsd
  544 c36bcc480   511   544 0004002 [SLPQ wait 0xc36bcc48][SLP] fsck
  524 c35e020c0   522   524 0004002 [SLPQ wait 0xc35e020c][SLP] bash
  522 c3570c480   406   522 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
  515 c36bc20c0 0 0 204 [SLPQ wdrain 0xc0707be4][SLP] md0
  511 c36bb6240   500   511 0004002 [SLPQ wait 0xc36bb624][SLP] bash
  509 c3570418   65 1   509 100 [SLPQ select 0xc0707644][SLP] 
 dhclient
  500 c361fa3c0   406   500 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd
  480 c342ea3c0 1   256 000 [SLPQ select 0xc0707644][SLP] 
 dhclient
  465 c361f6240 1   465 0004002 [SLPQ ttyin 0xc342b010][SLP] getty
  464 c35e0c480 1   464 0004002 [SLPQ ttyin 0xc3429410][SLP] getty
  463 c356fa3c0 1   463 0004002 [SLPQ ttyin 0xc3429810][SLP] getty
  462 c356f4180 1   462 0004002 [SLPQ ttyin 0xc343f010][SLP] getty
  422 c342e6240 1   422 000 [SLPQ nanslp 0xc06ba32c][SLP] cron
  416 c356f000   25 1   416 100 [SLPQ pause 0xc356f034][SLP] 
 sendmail
  412 c356f6240 1   412 100 [SLPQ select 0xc0707644][SLP] 
 sendmail
  406 c35e0 1   406 100 [SLPQ select 0xc0707644][SLP] sshd
  290 c361f20c0 1   290 000 [SLPQ flswai 0xc0707c24][SLP] 
 syslogd
  256 c36224180 1   256 000 [SLPQ select 0xc0707644][SLP] devd
  145 c356f8300 1   145 000 [SLPQ pause 0xc356f864][SLP] 
 adjkerntz
   38 c3378c480 0 0 204 [SLPQ - 0xd56f5cf8][SLP] schedcpu
   37 c342d0000 0 0 204 [SLPQ sdflush 0xc070a3b4][SLP] 
 softdepflush
   36 c342d20c0 0 0 204 [SLPQ vlruwt 0xc342d20c][SLP] vnlru
   35 c342d4180 0 0 204 [SLPQ ufs 0xc363c46c][SLP] syncer
   34 c342d624 

Re: quota and snapshots in 6.1-RELEASE

2006-06-29 Thread Kostik Belousov
On Fri, Jun 30, 2006 at 07:05:36AM +0300, Kostik Belousov wrote:
 OK, please, provide also your fstab, information on md config
 and dmesg, and kernel config. Also, it would be good to see the
 output of alltrace in ddb. It seems that your kernel does not
 contain quota option ?
Oh, I see, you _do_ have quotas. Please, show the quota
configuration. Meanwhile, output of alltrace is very much needed.


pgpuB1VgO9sRT.pgp
Description: PGP signature


Re: quota and snapshots in 6.1-RELEASE

2006-06-09 Thread Mike Jakubik

Konstantin Belousov wrote:

The hangs are mostly related to snapshots. It would be better to
update to the latest RELENG_6.

Hangs on RELENG_6_1 is not so much interesting. For
hanged RELENG_6 system, please do what described below and post
the log of the ddb session.

I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes).
If you have it in your kernel, add the line
hint.kbdmux.0.disabled=1
into the /boot/device.hints to make ddb usable.

After that, on the hang, enter ddb, and
do ps and tr pid for all suspected processes.
Better yet, add the following options to your kernel:
  


Since i don't have a serial console available and I'm not that 
knowledgeable with debuggers, would providing a fast download link to a 
core dump (I'm sure the dump would compress very well) be useful to 
anyone? Would someone be willing to spend the time to try and debug this?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-06-09 Thread Konstantin Belousov
On Fri, Jun 09, 2006 at 07:34:39PM -0400, Mike Jakubik wrote:
 Since i don't have a serial console available and I'm not that 
 knowledgeable with debuggers, would providing a fast download link to a 
 core dump (I'm sure the dump would compress very well) be useful to 
 anyone? Would someone be willing to spend the time to try and debug this?

IMHO, core dumps are much harder to deal with in your situation.
I prefer the way of gathering data I described before.


pgp1iBl8Jc62D.pgp
Description: PGP signature


Re: quota and snapshots in 6.1-RELEASE

2006-06-09 Thread Mike Jakubik

Konstantin Belousov wrote:

IMHO, core dumps are much harder to deal with in your situation.
I prefer the way of gathering data I described before.
  



OK, ill try to setup a serial console tomorrow then, ill have to borrow 
the cable from another system.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-06-06 Thread Mike Jakubik

Scott Long wrote:

Dmitriy Kirhlarov wrote:


Hi!

On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:



6.1-STABLE after 6.1-RELEASE is releases. So I think you may want


If you use snapshots with your quotas, update to 6.1-STABLE.  If you



Sorry, guys. You are mean RELENG_6_1 or RELENG_6?

WBR


RELENG_6.  However, the changes will likely make their way into 
RELENG_6_1 in a few weeks as part of an errata update.


Scott


I have just done tests on 6.1-R and RELENG_6 as of yesterday evening. 
Unfortunately both still lock up hard, no crash, just a frozen system. I 
cant enter the KDB (ddb) via the console, but its unusable, as it wont 
let me type in anything. There must be some other change in -CURRENT 
that fixes this, as -CURRENT did not freeze during my previous tests.



Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system:

/usr/src/sys/ufs/ufs/ufs_quota.c:
$FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14 
00:23:27 tegge Exp $


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-06-06 Thread Konstantin Belousov
On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote:
 Scott Long wrote:
 Dmitriy Kirhlarov wrote:
 
 Hi!
 
 On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:
 
 
 6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
 
 If you use snapshots with your quotas, update to 6.1-STABLE.  If you
 
 
 Sorry, guys. You are mean RELENG_6_1 or RELENG_6?
 
 WBR
 
 RELENG_6.  However, the changes will likely make their way into 
 RELENG_6_1 in a few weeks as part of an errata update.
 
 Scott
 
 I have just done tests on 6.1-R and RELENG_6 as of yesterday evening. 
 Unfortunately both still lock up hard, no crash, just a frozen system. I 
 cant enter the KDB (ddb) via the console, but its unusable, as it wont 
 let me type in anything. There must be some other change in -CURRENT 
 that fixes this, as -CURRENT did not freeze during my previous tests.
 
 
 Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system:
 
 /usr/src/sys/ufs/ufs/ufs_quota.c:
 $FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14 
 00:23:27 tegge Exp $
The hangs are mostly related to snapshots. It would be better to
update to the latest RELENG_6.

Hangs on RELENG_6_1 is not so much interesting. For
hanged RELENG_6 system, please do what described below and post
the log of the ddb session.

I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes).
If you have it in your kernel, add the line
hint.kbdmux.0.disabled=1
into the /boot/device.hints to make ddb usable.

After that, on the hang, enter ddb, and
do ps and tr pid for all suspected processes.
Better yet, add the following options to your kernel:

options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC

and, after hang, do in ddb

show allpcpu
show alllocks
show lockedvnods
ps

For each process mentioned in show output, do where pid
(for threaded processes, do thread thread-id; where).

BTW, it would be great to add this instructions to the FAQ.


pgpW6ly4sc0lu.pgp
Description: PGP signature


Re: quota and snapshots in 6.1-RELEASE

2006-06-06 Thread Konstantin Belousov
On Tue, Jun 06, 2006 at 09:22:34PM +0300, Konstantin Belousov wrote:
 The hangs are mostly related to snapshots. It would be better to
 update to the latest RELENG_6.
I need to clarify this no so wise statement made at the end of the
day.

It is believed that Tor Egge fixed all known cases of deadlocks
related to shapshots. From all that situations (around of ten),
only one required presence of active quotas and snapshot.

All fixes are MFCed to the RELENG_6. So, the interesting reports
on the deadlocks shall be done against RELENG_6, not RELENG_6_1
(until errata for snapshots is issued).

Guys, sorry for embarassing statement.


pgpi7huWVoMiy.pgp
Description: PGP signature


Re: quota and snapshots in 6.1-RELEASE

2006-05-24 Thread Scott Long

Dmitriy Kirhlarov wrote:


Hi!

On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote:



6.1-STABLE after 6.1-RELEASE is releases. So I think you may want


If you use snapshots with your quotas, update to 6.1-STABLE.  If you



Sorry, guys. You are mean RELENG_6_1 or RELENG_6?

WBR


RELENG_6.  However, the changes will likely make their way into 
RELENG_6_1 in a few weeks as part of an errata update.


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-24 Thread Massimo Lusetti
On Tue, 2006-05-23 at 16:35 -0400, Kris Kennaway wrote:

 If you use snapshots with your quotas, update to 6.1-STABLE.  If you
 don't use snapshots, 6.1-R should be fine.  This was discussed in
 excruciating depth a few weeks back, so please read the archives for
 more.

Probably I've to stress the box a little more but here seems to work
correctly. The box is going production really soon so we will see.

Thanks to all guys involved for the very good work!

-- 
Massimo
There are more way to do things, one is the bsd-way the others are wrong


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-23 Thread Rong-en Fan

On 5/23/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote:

Hi, list.

Some time ago quota and, AFAIR, snapshots in 6.1-RELEASE has deadlock
problems. What the current situation with this? I'm ready to test
patches, if needed.

WBR


IIRC, there are some quota and snapshots changes merged in
6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
to try that.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-23 Thread Mike Jakubik

Rong-en Fan wrote:

On 5/23/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote:

Hi, list.

Some time ago quota and, AFAIR, snapshots in 6.1-RELEASE has deadlock
problems. What the current situation with this? I'm ready to test
patches, if needed.

WBR


IIRC, there are some quota and snapshots changes merged in
6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
to try that.


Thats correct. I have been meaning to test these, but not had the time 
to do so yet. If you can, update to -STABLE and give it a test.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-23 Thread Chris Dillon

Quoting Mike Jakubik [EMAIL PROTECTED]:


IIRC, there are some quota and snapshots changes merged in
6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
to try that.


Thats correct. I have been meaning to test these, but not had the time
to do so yet. If you can, update to -STABLE and give it a test.


I have been running the fixes without problems since this weekend, but  
I was only bitten by the previous bugs maybe once or twice a week,  
even though I used snapshots and quotas extensively.  If my system  
manages to stay up at least two weeks it is most likely fixed. :-)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-23 Thread Mark Kane

Mike Jakubik wrote:

Rong-en Fan wrote:

On 5/23/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote:

Hi, list.

Some time ago quota and, AFAIR, snapshots in 6.1-RELEASE has deadlock
problems. What the current situation with this? I'm ready to test
patches, if needed.

WBR


IIRC, there are some quota and snapshots changes merged in
6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
to try that.


Thats correct. I have been meaning to test these, but not had the time 
to do so yet. If you can, update to -STABLE and give it a test.


I had a motherboard die in a server on Sunday. It is running FreeBSD 
5.4-RELEASE and the only hardware they had to replace it with was an 
Athlon64 CPU and a motherboard with an ATI chipset. With that board and 
5.4, it will only boot in Safe Mode, but then the hard drives are 
running at very slow speeds. It completely locked up earlier today and 
they had to do a hard reboot (with no errors on the screen or in 
/var/log/messages). My plan was to have them fully upgrade the server 
tonight to a new box (with the same Athlon64 and mobo with ATI chipset 
since that's all they have), and do a fresh install of 6.1-RELEASE 
because a very similar board with the same chipset was reported to work 
in 6.0.


Every machine I have or work on runs FreeBSD, but this is the only one 
that needs quotas because it runs cPanel for customers. I am not sure 
all the details about the problems with quotas, so will running 
6.1-RELEASE with quotas cause problems for sure? If so, any suggestions 
on what to do given my situation? Would 6.0 be any better? I'd like to 
have the latest version because doing updates properly remotely is 
difficult, but if it is not going to work then I may have to use 6.0 if 
that will work or figure something else out hardware wise.


Thanks

-Mark

--
GnuPG Public Key:
http://www.mkproductions.org/mk_pubkey.asc

Internet Radio:
Party107 (Trance/Electronic) - http://www.party107.com
Rock 101.9 The Edge (Rock) - http://www.rock1019.net

IRC:
MIXXnet IRC Network - irc.mixxnet.net (Nick: MIXX941)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: quota and snapshots in 6.1-RELEASE

2006-05-23 Thread Kris Kennaway
On Tue, May 23, 2006 at 03:18:25PM -0500, Mark Kane wrote:
 Mike Jakubik wrote:
 Rong-en Fan wrote:
 On 5/23/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote:
 Hi, list.
 
 Some time ago quota and, AFAIR, snapshots in 6.1-RELEASE has deadlock
 problems. What the current situation with this? I'm ready to test
 patches, if needed.
 
 WBR
 
 IIRC, there are some quota and snapshots changes merged in
 6.1-STABLE after 6.1-RELEASE is releases. So I think you may want
 to try that.
 
 Thats correct. I have been meaning to test these, but not had the time 
 to do so yet. If you can, update to -STABLE and give it a test.
 
 I had a motherboard die in a server on Sunday. It is running FreeBSD 
 5.4-RELEASE and the only hardware they had to replace it with was an 
 Athlon64 CPU and a motherboard with an ATI chipset. With that board and 
 5.4, it will only boot in Safe Mode, but then the hard drives are 
 running at very slow speeds. It completely locked up earlier today and 
 they had to do a hard reboot (with no errors on the screen or in 
 /var/log/messages). My plan was to have them fully upgrade the server 
 tonight to a new box (with the same Athlon64 and mobo with ATI chipset 
 since that's all they have), and do a fresh install of 6.1-RELEASE 
 because a very similar board with the same chipset was reported to work 
 in 6.0.
 
 Every machine I have or work on runs FreeBSD, but this is the only one 
 that needs quotas because it runs cPanel for customers. I am not sure 
 all the details about the problems with quotas, so will running 
 6.1-RELEASE with quotas cause problems for sure? If so, any suggestions 
 on what to do given my situation? Would 6.0 be any better? I'd like to 
 have the latest version because doing updates properly remotely is 
 difficult, but if it is not going to work then I may have to use 6.0 if 
 that will work or figure something else out hardware wise.

If you use snapshots with your quotas, update to 6.1-STABLE.  If you
don't use snapshots, 6.1-R should be fine.  This was discussed in
excruciating depth a few weeks back, so please read the archives for
more.

Kris



pgpbDarKh9pca.pgp
Description: PGP signature