On Sat, May 1, 2021 at 6:38 PM Samuel Thibault <samuel.thiba...@gnu.org> wrote: > Actually I'd say the pager should replace the cache. The pager is > already a cache by itself, we should not need to keep both the pager and > the cache, particularly since it means having to keep both coherent.
Well, yes, I've considered that; but I've tried to keep the changes less invasive than that. I'll take a stab at it if you think this is the way to go. This actually brings me to a question: why is tarfs using netfs over diskfs? I see that "a translator which shows a .tar archive in a unpacked way" is mentioned in the wiki [0] as a motivating example of what netfs should be used for, so there must be a good reason. Yet from what I see, libnetfs is suited for filesystems that are either served from a remote location (the net in netfs) or just synthesized on the fly; and on the other hand, the tar format, with its 512-byte blocks, sounds very much like a filesystem image to me. isofs uses diskfs, why doesn't tarfs? [0]: https://www.gnu.org/software/hurd/hurd/libnetfs.html > > Is there a better way to deal with a held mutex that unlocking it > > before calling a function that re-locks it? > > Yes, avoiding the situation. If you unlock a mutex, anything can then > happen, and you can easily get to an incoherent state. I understand; if some code drops a mutex and then re-acquires it, it has to be prepared that everything might have changed in the meantime. Which is admittedly something I have not done for this prototype. > > (Making the mutex recursive, perhaps? > > That's usually frowned upon because it's often a sign that you don't > know at various places whether you hold the lock or not. > > > Or extracting the inner part into its own function?) > > Yes, in some cases that makes sense. See, I cannot easily do either (make the mutex recursive or extract the logic). The flow goes something like this: 1. S_io_write () 2. lock the node, validate size/offset, grow it if needed 3. pager_memcpy (), fault 4. another thread (!) gets to pager_{read,write}_page () This is why the mutex cannot be made recursive: it's being grabbed from the other thread. And this is also why I cannot just extract the logic: I'm not calling the logic inside pager.c directly, I'm calling pager_memcpy (), which faults and *that* causes pager_{read,write}_page () to be called the same way it'd be called for any other task faulting on the mapping. And pager_{read,write}_page () naturally has to lock, to validate the size, access the cache, and whatnot. Perhaps you can think of a solution? How does diskfs cope with this? (I see that _diskfs_rdwr_internal () still has the node locked while calling pager_memcpy (), how come that doesn't deadlock?) -- Sergey