John -

I'm hitting a deadlock on a slightly modified fork of version 1.3.8, but
from what I can tell, the latest version of Fastbit would still have the
same problems. Upon looking into it further, it seems that there are
several code paths where a thread deadlocks itself, and subsequently, all
threads, when the cache size is approaching `maxBytes`.

In the case that I ran into, the `storage` constructor was trying to store
a file in memory, but when it realizes that there's not enough room in the
cache, it first tries to lock the `fileManager`'s mutex. The problem is,
that thread already locked the mutex in `getFile`.

A simple fix is to convert `fileManager`'s mutex to a recursive one, so a
lock attempt from within the same thread that already owns the lock would
succeed. This does add some overhead - I haven't measured how much. It also
runs the risk of breaking other sections of the code that I'm not familiar
with, which might want to unlock that mutex if it's locked. The recursive
mutex that's locked twice would need to unlock it twice, in that case.

I also considered using the standard mutex type, but passing down a boolean
that says whether the current thread has locked it. Once I saw how far that
reached, I opted for the easy recursive mutex.

I show the code path that deadlocked for me, along with the recursive mutex
initialization here:

https://gist.github.com/wblakecaldwell/89f29ddb1e98fedda245

I'd love to hear your thoughts - Thanks!

- Blake Caldwell
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to