On Fri, 22 Jan 2010, Mikolaj Golub wrote:
We have nonempty nm_bufq, nm_bufqiods = 1, but actually there is no nfsiod
thread run for this mount, which is wrong -- nm_bufq will not be emptied until
some other process starts writing to the nfsmount and starts nfsiod thread for
this mount.
Reviewing the code how it could happen I see the following path. Could someone
confirm or disprove me?
in nfs_bio.c:nfs_asyncio() we have:
1363 mtx_lock(&nfs_iod_mtx);
...
1374 /*
1375 * Find a free iod to process this request.
1376 */
1377 for (iod = 0; iod < nfs_numasync; iod++)
1378 if (nfs_iodwant[iod]) {
1379 gotiod = TRUE;
1380 break;
1381 }
1382
1383 /*
1384 * Try to create one if none are free.
1385 */
1386 if (!gotiod) {
1387 iod = nfs_nfsiodnew();
1388 if (iod != -1)
1389 gotiod = TRUE;
1390 }
Let's consider situation when new nfsiod is created.
nfs_nfsiod.c:nfs_nfsiodnew() before creating nfssvc_iod thread unlocks
nfs_iod_mtx:
179 mtx_unlock(&nfs_iod_mtx);
180 error = kthread_create(nfssvc_iod, nfs_asyncdaemon + i, NULL,
RFHIGHPID,
181 0, "nfsiod %d", newiod);
182 mtx_lock(&nfs_iod_mtx);
And nfs_nfsiod.c:nfssvc_iod() do the followin:
226 mtx_lock(&nfs_iod_mtx);
...
238 nfs_iodwant[myiod] = curthread->td_proc;
239 nfs_iodmount[myiod] = NULL;
...
244 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT
| PCATCH,
245 "-", timo);
Let's at this moment another nfs_asyncio() request for another nfsmount has
happened and this thread has locked nfs_iod_mtx. Then this thread will found
nfs_iodwant[iod] in "for" loop and will use it. When the first thread actually
has returned from nfs_nfsiodnew() it will insert buffer to nmp->nm_bufq but
nfsiod will process other nmp.
Ok, good catch, I think you've found the problem (or at least a race
that might have caused it).
It looks like the fix for this situation would be to check nfs_iodwant[iod]
after nfs_nfsiodnew():
--- nfs_bio.c.orig 2010-01-22 15:38:02.000000000 +0000
+++ nfs_bio.c 2010-01-22 15:39:58.000000000 +0000
@@ -1385,7 +1385,7 @@ again:
*/
if (!gotiod) {
iod = nfs_nfsiodnew();
- if (iod != -1)
+ if ((iod != -1) && (nfs_iodwant[iod] == NULL))
gotiod = TRUE;
}
Unfortunately, I don't think the above fixes the problem.
If another thread that called nfs_asyncio() has "stolen" the this "iod",
it will have set nfs_iodwant[iod] == NULL (set non-NULL at #238)
and it will remain NULL until the other thread is done with it.
If you instead make it:
if (iod != -1 && nfs_iodwant[iod] != NULL)
gotiod = TRUE;
then I think it fixes your scenario above, but will break for the
case where the mtx_lock(&nfs_iod_mtx) call in nfs_nfsnewiod() (#182) wins
out over the one near the beginning of nfssvc_iod() (#226), since in that
case, nfs_iodwant[iod] will still be NULL because it hasn't yet been
set by nfssvc_iod() (#238).
There should probably be some sort of 3 way handshake between
the code in nfs_asyncio() after calling nfs_nfsnewiod() and the
code near the beginning of nfssvc_iod(), but I think the following
somewhat cheesy fix might do the trick:
if (!gotiod) {
iod = nfs_nfsiodnew();
if (iod != -1) {
if (nfs_iodwant[iod] == NULL) {
/*
* Either another thread has acquired this
* iod or I acquired the nfs_iod_mtx mutex
* before the new iod thread did in
* nfssvc_iod(). To be safe, go back and
* try again after allowing another thread
* to acquire the nfs_iod_mtx mutex.
*/
mtx_unlock(&nfs_iod_mtx);
/*
* So long as mtx_lock() implements some
* sort of fairness, nfssvc_iod() should
* get nfs_iod_mtx here and set
* nfs_iodwant[iod] != NULL for the case
* where the iod has not been "stolen" by
* another thread for a different mount
* point.
*/
mtx_lock(&nfs_iod_mtx);
goto again;
}
gotiod = TRUE;
}
}
Does anyone else have a better solution?
(Mikolaj, could you by any chance test this? You can test yours, but I
think it breaks.)
rick
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"