Re: [9fans] That deadlock, again
On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: you must be in process context to qlock, because only processes can sleep. There's obviously at least one exception, because otherwise I would not have got a panic at startup. Or, for that matter there would not be active code ahead of the /sys/src/9/port/qlock.c:35,36 if(up == 0) panic(qlock); in qlock(). Or maybe that's where things are going wrong, but I doubt that the code is mistaken, I know my understanding is inadequate :-) ++L
Re: [9fans] That deadlock, again
was 0xf01e739e really the code that accesses up-qpctry? -- cinap ---BeginMessage--- On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: you must be in process context to qlock, because only processes can sleep. There's obviously at least one exception, because otherwise I would not have got a panic at startup. Or, for that matter there would not be active code ahead of the /sys/src/9/port/qlock.c:35,36 if(up == 0) panic(qlock); in qlock(). Or maybe that's where things are going wrong, but I doubt that the code is mistaken, I know my understanding is inadequate :-) ++L ---End Message---
Re: [9fans] That deadlock, again
hm... thinking about it... does the kernel assume (maybe in early initialization) that calling qlock() without a proc is ok as long as it can make sure it will not be held by another proc? -- cinap ---BeginMessage--- On Thu, Nov 18, 2010 at 12:53:52AM -0500, erik quanstrom wrote: you must be in process context to qlock, because only processes can sleep. There's obviously at least one exception, because otherwise I would not have got a panic at startup. Or, for that matter there would not be active code ahead of the /sys/src/9/port/qlock.c:35,36 if(up == 0) panic(qlock); in qlock(). Or maybe that's where things are going wrong, but I doubt that the code is mistaken, I know my understanding is inadequate :-) ++L ---End Message---
Re: [9fans] That deadlock, again
On Thu, Nov 18, 2010 at 10:20:33AM +0100, cinap_len...@gmx.de wrote: hm... thinking about it... does the kernel assume (maybe in early initialization) that calling qlock() without a proc is ok as long as it can make sure it will not be held by another proc? That's a question for Bell Labs, I suppose, but that's precisely what I believe. There is no other explanation for the panic. Moving the up == 0 test earlier will invalidate this assumption and cause the panic we have already seen. The issue here is whether there is a situation where qlock() is intentionally invoked where up == 0 (suggested by the positioning of the up == 0 test _after_ setting the locked condition). This is improbable, though, and needs sorting out: whereas setting the lock can be done with up == 0 - and we can also clear the lock - we cannot _fail_ to set the lock, because then the absence of up will trigger a panic. Now, we know that qlock() is called with up == 0, we have seen a panic being generated by such a call. Will it suffice to locate the invocation and somehow deal with it, or should we make qlock() more robust and cause it to reject a request from a space where up == 0? Definitely, if qlock() no longer allows invocations with up == 0 there will be simplifications in its implementation. For example, the line if(up != nil up-nlocks.ref) print(qlock: %#p: nlocks %lud\n, getcallerpc(q), up-nlocks.ref); will no longer need the up != nil test. But I'm convinced there's more here than meets the eye. Unfortunately, while I have a Plan 9 distribution at my fingertips, I'm not going to try to fix this problem in a 9vx environment, I'll wait until I get home to deal with the native stuff. But one can speculate... ++L
[9fans] [plan9mod] Plan9 under VirtualBox
Good day! I'm trying to install the system on VirtualBox, but the process is very slow. Is this normal?
Re: [9fans] [plan9mod] Plan9 under VirtualBox
On Thu, 18 Nov 2010 11:14:51 GMT Artem Novikov noviko...@gmail.com wrote: Good day! I'm trying to install the system on VirtualBox, but the process is very slow. Is this normal? When I tried it, installing Plan 9 in VBox was horribly slow; installing it directly on hardware was somewhat slow, but not nearly as bad as in VBox. This is clearly a VBox bug, as there were reports in the VBox forums and/or bug tracker that booting OpenSolaris was horribly slow as well. Robert Ransom signature.asc Description: PGP signature
Re: [9fans] That deadlock, again
if(up != nil up-nlocks.ref) print(qlock: %#p: nlocks %lud\n, getcallerpc(q), up-nlocks.ref); will no longer need the up != nil test. that's just wrong. if the kernel is qlocking without a up, there's a bug. stack dumps are your friend. but i have a feeling that there is a mistake in your modification to qlock. you didn't have this panic before you modified qlock. - erik
Re: [9fans] That deadlock, again
On Thu Nov 18 10:23:20 EST 2010, quans...@quanstro.net wrote: if(up != nil up-nlocks.ref) print(qlock: %#p: nlocks %lud\n, getcallerpc(q), up-nlocks.ref); will no longer need the up != nil test. that's just wrong. if the kernel is qlocking without a up, there's a bug. stack dumps are your friend. but i have a feeling that there is a mistake in your modification to qlock. you didn't have this panic before you modified qlock. and i'm just wrong. intentionally or not, devsd does qlock things with no up from sdreset(). ether82598 does too (my fault). - erik
Re: [9fans] That deadlock, again
and i'm just wrong. intentionally or not, devsd does qlock things with no up from sdreset(). ether82598 does too (my fault). I suggest you fix ether82598: it is OK to call qlock() and qunlock() without up, but only if sure that the qlock() will succeed. If it has to wait, it will panic. Given that, why do the locking at all? ++L
Re: [9fans] That deadlock, again
but i have a feeling that there is a mistake in your modification to qlock. you didn't have this panic before you modified qlock. qlock() is broken, or at the very least ambivalent. Someone ought to put it out of its misery: is it legal or is it not to call qlock() in a up == 0 context? ++L
Re: [9fans] That deadlock, again
If it has to wait, it will panic. Given that, why do the locking at all? i assume the intention is along these lines: it's to allow the use during reset of a given driver's standard functions that normally must qlock, to avoid requiring two copies of them, with and without the qlock. after reset, it's illegal to call qlock without a process (notably in an interrupt function), as it previously was.
Re: [9fans] That deadlock, again
it's to allow the use during reset of a given driver's standard functions that normally must qlock, to avoid requiring two copies of them, with and without the qlock. after reset, it's illegal to call qlock without a process (notably in an interrupt function), as it previously was. I'm willing to credit the validity of this, but I believe then that it ought to be more explicit. It seems to me that having a situation where a panic can ensue if a lock is already taken is too risky. Is it possible to count the instances of such qlock() invocations in the present kernel code and find out how common the problem really is? Or should one simply treat such invocations as innocuous and just omit connecting a user process to the queue when no user process is specified, if the lock is taken? That sounds positively explosive! ++L
Re: [9fans] That deadlock, again
after reset, it's illegal to call qlock without a process (notably in an interrupt function), as it previously was. That suggests that the (hopefully) few instances of qlock() invocations that may occur in this space should be burdened with the need to check for the value of up and altogether skip the call if it's nil. Mind you, this is not the problem we set out to fix anymore, although no doubt there is a relationship, however tenuous. ++L
Re: [9fans] That deadlock, again
I suggest you fix ether82598: it is OK to call qlock() and qunlock() without up, but only if sure that the qlock() will succeed. If it has to wait, it will panic. yes. that's it. If it has to wait, it will panic. Given that, why do the locking at all? i assume the intention is along these lines: it's to allow the use during reset of a given driver's standard functions that normally must qlock, to avoid requiring two copies of them, with and without the qlock. after reset, it's illegal to call qlock without a process (notably in an interrupt function), as it previously was. i think lucio has stated the current restriction more exactly, but this is closer to the assumed intent. perhaps we should make this what qlock actually does with diff -c /n/dump/2010/1118/sys/src/9/pc/main.c pc/main.c /n/dump/2010/1118/sys/src/9/pc/main.c:201,206 - pc/main.c:201,207 poperror(); } kproc(alarm, alarmkproc, 0); + conf.postdawn = 1; touser(sp); } diff -c /n/dump/2010/1118/sys/src/9/pc/dat.h pc/dat.h /n/dump/2010/1118/sys/src/9/pc/dat.h:107,112 - pc/dat.h:107,113 ulong ialloc; /* max interrupt time allocation in bytes */ ulong pipeqsize; /* size in bytes of pipe queues */ int nuart; /* number of uart devices */ + int postdawn; /* mutiprogramming on */ }; /* diff -c /n/dump/2010/1118/sys/src/9/port/qlock.c port/qlock.c /n/dump/2010/1118/sys/src/9/port/qlock.c:18,23 - port/qlock.c:18,25 { Proc *p; + if(up == nil conf.postdawn) + panic(qlock: %#p: postdawn up nil\n, getcallerpc(q)); if(m-ilockdepth != 0) print(qlock: %#p: ilockdepth %d\n, getcallerpc(q), m-ilockdepth); if(up != nil up-nlocks.ref) a test kernel i've got does boot with this test. - erik
Re: [9fans] That deadlock, again
on second thought, conf.postdawn should be set in schedinit(). - erik
Re: [9fans] That deadlock, again
/n/dump/2010/1118/sys/src/9/port/qlock.c:18,23 - port/qlock.c:18,25 { Proc *p; + if(up == nil conf.postdawn) + panic(qlock: %#p: postdawn up nil\n, getcallerpc(q)); if(m-ilockdepth != 0) print(qlock: %#p: ilockdepth %d\n, getcallerpc(q), m-ilockdepth); if(up != nil up-nlocks.ref) Yes, this is the type of explicit-ness I was thinking of. Note that you can now drop further tests for up == 0 later in the qlock() text. ++L
Re: [9fans] That deadlock, again
Yes, this is the type of explicit-ness I was thinking of. Note that you can now drop further tests for up == 0 later in the qlock() text. Hm, spoke too quickly. The tests on up have to remain, sadly. Sorry about the misleading noise. ++L
Re: [9fans] Anyone using p9p or Plan 9 venti as a more generic backup system?
On Wed, 17 Nov 2010 09:44:27 PST David Leimbach leim...@gmail.com wrote: On Wed, Nov 17, 2010 at 9:23 AM, dexen deVries dexen.devr...@gmail.comwrote : On Wednesday 17 November 2010 18:14:35 Venkatesh Srinivas wrote: (...) I'd be very careful with vac -m and -a on Unix; both have been at the root of considerable data-loss on a unix venti for me. I'd recommend vac-ing tarballs, rather than using vac's on unix trees directly. But your mileage may vary... could you please elaborate a bit about that data loss? traversing symlinks breaks? some files not getting read by vac at all? (I'm interested in using p9p vac+venti in similar manner, but on Linux w/ GNU stuff) I could imagine vac/unvac not dealing with resource forks or POSIX extended attributes and such properly, as well as potentially having difficulty with symlinks, but having dealt with stuff like that in xar, I don't think it's too difficult to address. I may need to read up on venti and see what sorts of data types it supports. Might be time to add some extensions? venti doesn't care but vac/unvac do deal with symlinks, fifos and special devices. The problem with -a is that a /mmdd/ prefix gets prepended to all paths and these dirs are readonly (555). unvac coredumps in trying to extract anything under /. The real problem is that unvac needs to handle non-empty 555 dirs specially (like tar does). Try this on unix: mkdir -p a/b chmod 555 a tar cf - a | (cd /tmp; tar -xvf -) vac a | (cd /tmp; unvac -v) The basic problem is that venti friends need some grunt work to make them bullet/idiot proof.
Re: [9fans] Anyone using p9p or Plan 9 venti as a more generic backup system?
On Thursday 18 November 2010 20:40:13 Bakul Shah wrote: On Wed, 17 Nov 2010 09:44:27 PST David Leimbach leim...@gmail.com wrote: On Wed, Nov 17, 2010 at 9:23 AM, dexen deVries dexen.devr...@gmail.comwrote On Wednesday 17 November 2010 18:14:35 Venkatesh Srinivas wrote: (...) I'd be very careful with vac -m and -a on Unix; both have been at the root of considerable data-loss on a unix venti for me. I'd recommend vac-ing tarballs, rather than using vac's on unix trees directly. But your mileage may vary... could you please elaborate a bit about that data loss? traversing symlinks breaks? some files not getting read by vac at all? (I'm interested in using p9p vac+venti in similar manner, but on Linux w/ GNU stuff) I could imagine vac/unvac not dealing with resource forks or POSIX extended attributes and such properly, as well as potentially having difficulty with symlinks, but having dealt with stuff like that in xar, I don't think it's too difficult to address. I may need to read up on venti and see what sorts of data types it supports. Might be time to add some extensions? venti doesn't care but vac/unvac do deal with symlinks, fifos and special devices. The problem with -a is that a /mmdd/ prefix gets prepended to all paths and these dirs are readonly (555). unvac coredumps in trying to extract anything under /. The real problem is that unvac needs to handle non-empty 555 dirs specially (like tar does). Try this on unix: mkdir -p a/b chmod 555 a tar cf - a | (cd /tmp; tar -xvf -) vac a | (cd /tmp; unvac -v) The basic problem is that venti friends need some grunt work to make them bullet/idiot proof. thanks ;) -- dexen deVries ``One can't proceed from the informal to the formal by formal means.''
Re: [9fans] Plan9 development
isn't this redundant with cpp(1)'s __FUNCTION__? if __FUNCTION__ isn't standard, then we should change it to __func__ in cpp and that's it On Thu, Nov 18, 2010 at 2:30 AM, Joel C. Salomon joelcsalo...@gmail.com wrote: On 11/14/2010 04:44 PM, Charles Forsyth wrote: the list of unimplemented items in /sys/src/cmd/cc/c99* is: snip i can think of something else that's not been noticed, but what other things have you found? Why is __func__ listed as “unwanted”? I’ve found it useful for some logging functions. --Joel -- Federico G. Benavento
Re: [9fans] Plan9 development
On 11/18/2010 05:50 PM, Federico G. Benavento wrote: On Thu, Nov 18, 2010 at 2:30 AM, Joel C. Salomon joelcsalo...@gmail.com wrote: Why is __func__ listed as “unwanted”? I’ve found it useful for some logging functions. isn't this redundant with cpp(1)'s __FUNCTION__? if __FUNCTION__ isn't standard, then we should change it to __func__ in cpp and that's it Um, how can the preprocessor know what function it’s in middle of? (That’s why, unlike the preprocessor symbols __FILE__ __LINE__, C99’s __func__ is an identifier.) --Joel
Re: [9fans] Plan9 development
my bad, I thought cpp(1) implemented __FUNCTION__... On Thu, Nov 18, 2010 at 11:06 PM, Joel C. Salomon joelcsalo...@gmail.com wrote: On 11/18/2010 05:50 PM, Federico G. Benavento wrote: On Thu, Nov 18, 2010 at 2:30 AM, Joel C. Salomon joelcsalo...@gmail.com wrote: Why is __func__ listed as “unwanted”? I’ve found it useful for some logging functions. isn't this redundant with cpp(1)'s __FUNCTION__? if __FUNCTION__ isn't standard, then we should change it to __func__ in cpp and that's it Um, how can the preprocessor know what function it’s in middle of? (That’s why, unlike the preprocessor symbols __FILE__ __LINE__, C99’s __func__ is an identifier.) --Joel -- Federico G. Benavento