If the filesystem returns an error we need to cleanup and avoid a deadlock.
This can happen if there is a disk corruption, or one has a stale ino (they can 
get repurposed on restart)

Its easy to reproduce this bug by placing a 'poison' in diskfs_user_read_node 
for a particular ino
and then just try to access the corresponding file from live system. 

Before this fix:
 - deadlock
With this fix:
 - 'stat: cannot statx '<filename>': Input/output error'

---
 libdiskfs/node-cache.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/libdiskfs/node-cache.c b/libdiskfs/node-cache.c
index 1ff19ade..d11e5866 100644
--- a/libdiskfs/node-cache.c
+++ b/libdiskfs/node-cache.c
@@ -112,7 +112,22 @@ diskfs_cached_lookup_context (ino_t inum, struct node 
**npp,
   /* Get the contents of NP off disk.  */
   err = diskfs_user_read_node (np, ctx);
   if (err)
+   {
+    pthread_rwlock_wrlock (&nodecache_lock);
+    hurd_ihash_remove (&nodecache, (hurd_ihash_key_t) &np->cache_id);
+    pthread_rwlock_unlock (&nodecache_lock);
+
+    /* Don't delete from disk. */
+    np->dn_stat.st_nlink = 1; 
+    np->allocsize = 0;        
+    np->dn_set_ctime = 0;
+    np->dn_set_atime = 0;
+    np->dn_set_mtime = 0;
+    diskfs_nput (np);
+    *npp = NULL;
+
     return err;
+   }
   else
     {
       *npp = np;
-- 
2.51.0


Reply via email to