On Tue, Nov 30, 2010 at 12:33:38PM +0000, Philip Martin wrote: > "Hyrum K. Wright" <hyrum_wri...@mail.utexas.edu> writes: > > > Stefan's patch to make a recursive proplist much more performant > > highlights the great benefit that our sqlite-backed storage can have. > > However, he reverted it due to concerns about the potential for > > database contention. The theory was that the callback might try and > > call additional wc functions to get more information, and such nested > > statements weren't healthy for sqlite. We talked about it for a bit > > in IRC this morning, and the picture raised by this issue was quite > > dire. > > It's not the nested statements that are the problem, it's extending the > duration of the select. The callback is to the application so it could > be updating a GUI, interacting with the user, etc. and so could take a > much greater length of time than other selects. An SQLite read blocks > SQlite writes by other threads/processes, so the long lived select may > cause writes to fail that would otherwise succeed. (We have a busy > timeout of 10 seconds so that's the sort of time that becomes a > problem.) > > It's related to the problem of querying the working copy while it is > locked. In 1.6 it was possible to run status, info, propget, etc. while > an update was in progress. In 1.7 that's no longer possible, all the > read operations fail. If we decide that this change in behaviour is > acceptable then the long-lived select isn't really a problem, it simply > stops the operations like update from starting. If we want the 1.6 > behaviour then the long-lived select becomes more of a problem, as it > makes update much more likely to fail part way through. > > The callback is not necessarily wrong, it depends what sort of behaviour > we are trying to provide.
I think we should review our requirements for callbacks again. <recap> In 1.7 we have two layers of locking. One is the sqlite locking, which guarantees meta data consistency in wc.db. The other is working copy write locks (svn_wc__db_wclock_obtain()) which are obtained at the start of long-running operations which modify the working copy (e.g. commit, update, merge, patch, ...). The problem we're facing is that we want to allow read-only operations (e.g. status, proplist, ...) to run concurrently with N other read-only operations and at most one write operation. For instance, the user might have TortoiseSVN and some IDE SVN plugin running while working with the CLI client on the same working copy. The CLI client should be able to perform write operations while the other clients periodically poll the WC status to update their icons. The WC write lock will not pose a problem here since it is only taken by writers of which there must at most be one at a time (for a given subtree of the WC). Readers won't attempt to acquire this lock. The sqlite locks on the other hand creates a problem during wc.db queries. A writer blocks all readers, and any active reader will block the writer (see http://sqlite.org/atomiccommit.html). Our solution to this problem is to keep the queries short, so that sqlite locking will not cause long-running read or write operations to stall or fail. </recap> The problem with callbacks is that they can in theory take a long amount of time to complete. So when running callbacks while an sqlite query is running, the sqlite query could take a long time to complete. We can solve this problem by never calling back to the client while an sqlite transaction is in progress. However this has an impact on performance. Even if retrieving the desired information is possible with a single query, we have to run multiple queries to retrieve the information in chunks (e.g. per directory) so we can pass it to the callback. The other option is to place restrictions on what a client callback is allowed to do, depending on which "locking context" the callback runs in. A locking context implies restrictions on what the callback is allowed to do. E.g. we could say that the proplist callback runs within a "wc.db locking context". This would be a highly restrictive context, because an sqlite transaction is in progress. The only action allowed is to consume the data passed in (e.g. print it or store it to a temporary file or a buffer for later processing). The callback should complete its task as quickly as possible to avoid disturbing other SVN clients. No calls to any function in libsvn_client or libsvn_wc would be allowed from this context. If the callback doesn't comply with this restriction, there is a risk of undefined behaviour of the entire system. This means that a badly behaving client can disturb user experience. But it also allows us to be maximally efficient. If backwards compatibility is a concern, we could allow callbacks to be called in this context only for API calls that exist in 1.7 and newer. Backwards-compat code would use a different implementation with a less restrictive locking context for these callbacks. This would allow existing callbacks written against 1.6.x and earlier APIs to continue to work unchanged. Then we also have callbacks that run in "wc write locking context". E.g. the callback that gets a commit log message from the caller might run within this context if the commit operation involves a working copy. These callbacks can wait for user input, and must assume that they run with a WC write lock held, but not with an sqlite lock. Calling any function that will start a new write operation on the working copy is prohibited (because it will fail when it attempts to acquire another WC lock), but read-only operations on the working copy are allowed. There is also the "lock-free context". For instance, the blame line receiver (svn_client_blame_receiver3_t) would run in this context. This context doesn't impose any restrictions. I've mentioned the latter two contexts for completeness. The first thing to do would be to document which callbacks are run within the "wc.db locking context". Once that is done we could also introduce the other contexts (for completeness) and assign existing callbacks to them. Does this approach sound sane? If so, I would like to revive the "fast" proplist implementation which was reverted in r1040135, and apply a similar approach to other subcommands where possible. I would also try to provide separate implementations for the backwards-compat code if required. Stefan