[
https://issues.apache.org/jira/browse/JENA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009433#comment-13009433
]
Stephen Allen commented on JENA-41:
-----------------------------------
I think your idea about the DatasetGraph being the interface for transactions
makes sense. Transactional DatasetGraphs could also provide fallback behavior
for legacy code by implementing autocommit transactions if the user called
methods on a dataset that was not initialized in a transactionBegin() call.
With regard to the isolation levels, I believe some of the lower levels can
make sense for particular applications or queries. For example say you want to
know the size of a few of graphs.
BEGIN READ_ONLY;
select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;
COMMIT;
Assuming a traditional pessimistic locking scheme, running the transaction at
SERIALIZABLE could cause the locks held by the first select query to also be
held through the second query, reducing concurrency (using two transactions
instead might not be a good idea as there is usually some amount of overhead
associated with creating and committing transactions).
If you were OK with the possibility that the two query results are not truly
serializable with respect to each other, then you could improve concurrency by
using a READ_COMMITTED isolation level instead that would give serializable
results for each query (but not the whole transaction). And if you really just
needed a rough estimate of size, using READ_UNCOMMITTED may be able to avoid
locking all together.
An additional motivating factor for MVCC implementations is that they may be
implementing snapshot isolation, which probably maps better to READ_COMMITTED
than SERIALIZABLE (especially if it could do predicate locking for true
serializable behavior but allow cheaper snapshot isolation if that was all that
was needed). The Postgres documentation does a good job of describing this [1].
I would find it useful to have multiple isolation levels available (even if
internally I'm mapping them all to SERIALIZABLE at first). The four ANSI
Isolation levels seem appropriate, and remember that implementations are
allowed to map unavailable lower levels to higher levels as desired.
[1] http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html
> Different policy for concurrency access in TDB supporting a single writer and
> multiple readers
> ----------------------------------------------------------------------------------------------
>
> Key: JENA-41
> URL: https://issues.apache.org/jira/browse/JENA-41
> Project: Jena
> Issue Type: New Feature
> Components: Fuseki, TDB
> Reporter: Paolo Castagna
> Attachments: Transaction.java, TransactionHandle.java,
> TransactionHandler.java, TransactionManager.java,
> TransactionManagerBase.java, TransactionalDatasetGraph.java
>
>
> As a follow up to a discussion about "Concurrent updates in TDB" [1] on the
> jena-users mailing list, I am creating this as a new feature request.
> Currently TDB requires developers to use a Multiple Reader or Single Writer
> (MRSW) locking policy for concurrency access [2]. Not doing this could cause
> data corruptions.
> The MRSW is indeed a MR xor SW (i.e. while a writer has a lock, no readers
> are allowed and, similarly, if a reader has a lock, no writes are possible).
> This works fine in most of the situation, but there might be problems in
> presence of long writes or long reads.
> It has been suggested that a "journaled file access" could be used to solve
> the issue regarding a long write blocking reads.
> [1] http://markmail.org/message/jnqm6pn32df4wgte
> [2] http://openjena.org/wiki/TDB/JavaAPI#Concurrency
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira