I’m really glad to read this thread! MRMW operation could lead to some great performance gains for the users.
Andy mentioned that I’ve been looking at some overlapping issues, in the context of a “writer-per-named-graph” dataset design. I’ll comment in-line. --- A. Soroka The University of Virginia Library > On Jan 3, 2016, at 8:45 AM, Claude Warren <[email protected]> wrote: >> As I understand it, the application is now involved to describe the part of >> the graph to lock. Is that right? > > That was my thought yes. I was only looking at how to lock for write. I had > not considered read locking, though I expected it would work like the current > lock does in the sense that read locks block write locks. Though I suppose > it might be possible in some situations to specify what the thread was going > to read. In TxnMem I was able to use persistent datastructures to allow readers to continue while a writer was operating (with isolation). That may not be a characteristic we can always support, but it might be useful to think about the cases where we can support it. (E.g. when the readers and writers know in advance what is to be covered, or when a reader is doing something that the application knows is not going to be affected by some particular writers, even though they may be working over the same regions.) >> Does failure mean an exception or waiting? > My thought was failure/exception. As you point out waiting leads to deadlock > issues. However, the decision to wait or fail could be implemented outside > of the engine that determines if the lock can be granted. So it could be a > plugable decision. For the first cut I would simply implement failure. > But looking at the enterCriticalSection() code I see that it does not allow > failure, but that there is a mention of an error if a read lock is promoted > to a write lock. I am unsure how to handle this case. I think that a lock > failure exception may be needed. I think failing by default is going to be confusing to many users of this machinery. Indirecting the decision seems like a good way to provide flexibility, and I suspect that a default of blocking on the lock will be more “ergonomic” for users. > A possible solution is to delay and retry with a retry time out/limit before > throwing a timeout type exception. This area needs work. Blocking-with-policy is nicely flexible. Andy showed a nice idiom that could be extended to this kind of us: https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/core/mem/DatasetGraphInMemory.java#L210 Perhaps something like: {noformat} LockAcquisitionPolicy policyDefault = …; LockAcquisitionPolicy specialPolicy = …; LockRegions locks = new LockRegions(policyDefault); Triple lockregion1 = ...; locks.addRegion(lockregion1, ReadWrite.WRITE); // do not acquire lock, just add region to scope Triple lockregion2 = …; locks.addRegion(lockregion2, ReadWrite.READ, specialPolicy); // do not acquire lock, just add region to scope Consumer<LockRegions> task = locks -> { doSomeStuff(); locks.unlock(lockregion1); doSomeMoreStuff(); locks.unlock(lockregion2); finishUp(); } try { withLocks(locks, task); // acquire locks and execute task with them } catch( LockException e ) { log.error(“Urg!"); } finally { locks.unlock(); } The LockRegions object could then be reused (possibly with regions added or removed) after .unlock(), so the work of determining what triples map into an entity doesn’t have to be entirely redone. {noformat} >> How far back are you going to wind back to? In a transaction system either >> the transaction is aborted (so all the way back); some system retry (i.e. >> rerun application code). >> > I had assumed that the application would lock all the triples it needed at > the start of the lock. This seems non-ideal to me from a user standpoint. It would be nice to be able to add locks on the fly (possibly blocking) because in a schemaless database, the application may not know at the beginning of the locked section what all of the triples involved in an entity are, and there may be no way to discover that short of read-locking some triples and developing more queries based on those triples and iterating that. > So lets assume code execution like: > {noformat} > { > try { > Lock( <foo ANY ANY> ) > // some processing here > try { > Lock( <bar ANY ANY >) > // some processing here > } catch (LockFailedException e) { > // handle "bar" lock failure > } finally { > Unlock( <bar ANY ANY > ) > } > } catch (LockFailedException e ) { > // handle "foo" lock failure > } finally { > Unlock( <foo, ANY ANY > ) > } > // for grins ;) > Unlock(); // release all held locks. > } > > {noformat} It would be really nice if the locking API produced some kind of object that can be carried away and shared. In other conversations we’ve begun to touch on the possibility of transactions shared between threads, and it would be nice to plan for that as early as possible. >> If we consider the lock request to be a set of patterns then then the lock > itself holds a set of patterns. For example Lock( <foo ANY ANY> <bar ANY ANY > ) would create a lock holding the fooAny and barAny triples. Subsequently > calling Lock( <baz, ANY ANY> ) would result in the set: <foo ANY ANY> <bar > ANY ANY> <baz ANY ANY>. calling Unlock( <bar ANY ANY> ) would result in the > set: <foo ANY ANY> <bar ANY ANY>. This gets a little trickier when the patterns overlap, doesn’t it? Or is the contract here that Unlock can only be used with a pattern that exactly matches a pattern that has been used with Lock? > I think that the starting place may be to extend the Lock interface to > include: > enterCriticalSection( triple .... ); // establish a lock for the triples > Graph getLockPattern(); // get the lock pattern > void lock( triple ... ); // add to lock pattern > void unlock( triple ...); // remove from lock pattern > void unlock(); // remove all locks > > It might make more sense to put the last 4 into a separate interface and have > a method to return an instance of that interface. Something like: > > LockHoldings getLocks(); > > It seems to me the complex part is going to be determining if any thread > already has the a lock on the object from the patterns. I can dig into that > to see if I can get an early prototype of lock tracking. This is something like a generalization of where I ended up for my draft of “per-graph” locking. I have a LockSet that is pretty simple because I only have to deal with a single pattern <g ANY ANY ANY>. In my case, I was able to determine who is locking what fairly easily with set intersection. It might be worth trying to think about how to build up simpler cases (e.g. patterns that partition the dataset between them) into the more general case of arbitrary patterns.
