Hi,
Please consider following scenario (with the patch/fix from Larry):
-> engine 0: calls iput() and lock inode_lock. iput removes the inode
from the i_list and unlocks inode_lock
---> engine 1: grab inode_lock and calls __sync_one()
-> engine 0: calls clear_inode(), get past the call to "wait_on_inode()"
which looks if I_LOCK is set.
/* From this point onwards clear_inode() and the remainder of iput()
does not care about I_LOCK or inode_lock. */
Now with new changes, it will wait for inode_lock before setting the
state to I_CLEAR. So now we are waiting for inode_lock on engine 0.
---> engine 1: Sets I_LOCK and release the inode_lock
-> engine 0: We now get the lock and set the state to I_CLEAR, release
the lock and free the inode (though on engine 1 we have set I_LOCK and
are thinking that no one will destroy this inode).
Though numerous kind of corruption is possible now, I am sighting one
example here: Under low memory condition it is possible that the inode
from inode_cachep (freed inode cache), will be returned to system memory
(subject to all the objects in that particular slab is freed). And that
memory chuck (which we were just now using for inode) is allocated to
some other process. Suppose, this new process which just got this newly
allocated chunk, goes and clears the field, which was earlier i_state,
to NULL, or some other value (other than value which suggests I_FREEING
or I_CLEAR is set)
---> engine 1: We get the spin lock, clears I_LOCK (even though we don't
own this memory chunk anymore), see (As per above mentioned example
scenario) that I_FREEING or I_CLEAR is not set and insert this freed
inode into the list!! This way we will still end up in corrupted list
(as per above example scenario).
Please correct me if I am wrong somewhere.
Thanks,
Rahul
-----Original Message-----
From: Marcelo Tosatti [mailto:[EMAIL PROTECTED]
Sent: Monday, August 08, 2005 9:55 PM
To: Srivastava, Rahul
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[email protected]
Subject: Re: FW: oops in 2.4.25 prune_icache() called from kswapd
On Mon, Aug 08, 2005 at 11:45:28AM -0500, Srivastava, Rahul wrote:
> Hi All,
>
> I was just wondering if any of you guys had a chance to validate the
> hypothesis and the proposed fix.
>
> Thanks,
> Rahul
>
>
> -----Original Message-----
> From: Srivastava, Rahul
> Sent: Tuesday, August 02, 2005 8:32 AM
> To: 'Marcelo Tosatti'; 'Ernie Petrides'; 'Larry Woodman'
> Subject: RE: oops in 2.4.25 prune_icache() called from kswapd
>
>
> Hi,
>
> Thanks for reviewing the mail. I was thinking whether below changes in
> clear_inode() will close the race window:
>
> in clear_inode(), change line:
>
> inode->i_state = I_CLEAR;
>
> with below piece of code:
>
> *****
> spin_lock(&inode_lock);
> while (inode->i_state & I_LOCK) {
> spin_unlock(&inode_lock);
> __wait_on_inode(inode);
> spin_lock(&inode_lock);
> }
> inode->i_state = I_CLEAR;
> spin_unlock(&inode_lock);
> *****
>
> I feel the race is between "__sync_one()" and "iput()/clear_inode()"
> (also suggested by Albert) which is as follows:
>
> **************** race
> condition*******************************************
>
>
> engine 0:
> |
> calls iput() and lock inode_lock. iput removes the inode from the
i_list
> and unlocks |
> inode_lock
> |
>
> |
>
> | engine 1:
>
>
> | grab inode_lock and calls __sync_one()
>
> |
> engine 0:
> |
> calls clear_inode(), get past the call to "wait_on_inode()" which
looks
> if I_LOCK is set. |
> /* From this point onwards clear_inode() and the remainder of iput()
> does not care about |
> I_LOCK or inode_lock. */
> |
>
> |
>
> | engine 1:
>
>
> | Sets I_LOCK.
>
> |
> engine 0:
> |
> sets i_state = I_CLEAR
> |
> iput() calls destroy_inode()
> |
> kmem_cache_free() returns the inode to free list of inode cache.
> |
>
> |
>
> | engine 1:
>
> | Goes ahead and inserts the freed inode into one of the three
> | possible
> lists.
As stated in private, Larry's fix should catch that in __refile_inode()
and ignore the I_CLEAR inode.
> And we endup in having a corrupted inode on the inode list.
>
> Your thoughts please.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html