[
https://issues.apache.org/jira/browse/JENA-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279370#comment-14279370
]
Stephen Allen commented on JENA-848:
------------------------------------
So actually using the NRT IndexReader like I originally did would have caused a
read uncommitted isolation. Instead it should indeed create the IndexReader
off of the Directory, which will give serializable isolation (changes won't
appear until IndexWriter.commit() is called).
While investigating this I realized that all of the transaction subsystem for
DatasetGraphText was incorrect. I've gone through and fixed it. However, the
only way to fix it without drastically changing the architecture was to
introduce a ThreadLocal to keep track of transaction state just like
DatasetGraphWithLock and TDB's DatasetGraphTransaction does. On the plus side,
the Lucene index isolation is now serializable and it uses the 2-phase commit
support that Lucene has to make it as atomic as possible.
The same changes are also going to have to be applied to jena-spatial. In fact
there is a lot of similar code in there, we should probably extract that into a
shared module.
> jena-text Lucene concurrency issues
> -----------------------------------
>
> Key: JENA-848
> URL: https://issues.apache.org/jira/browse/JENA-848
> Project: Apache Jena
> Issue Type: Bug
> Components: Text
> Reporter: Stephen Allen
> Assignee: Stephen Allen
>
> When using jena-text with an in-process Lucene index, there are concurrency
> issues when multiple requests are accessing the Dataset in using transactions.
> It appears the problem is that a new Lucene IndexWriter is created at every
> transaction start with no concurrency control. Instead the solution should
> be to create a single IndexWriter when the DatasetGraphText is created and
> use that for all requests. This works because the Lucene IndexWriter is
> thread safe and designed for concurrent access.
> This should also increase performance by not continually opening and closing
> the IndexWriter. Also we can use Near Real-Time (NRT) IndexReaders that
> don't have to wait until changes are pushed to disk.
> If concurrent access is not controlled then you can end up with IndexWriter
> objects being closed while they are still in use by other threads:
> {code}
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
> at
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2974)
> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2954)
> at
> org.apache.jena.query.text.TextIndexLucene.finishIndexing(TextIndexLucene.java:122)
> at
> org.apache.jena.query.text.TextDocProducerTriples.finish(TextDocProducerTriples.java:46)
> at
> org.apache.jena.query.text.DatasetGraphText.commit(DatasetGraphText.java:122)
> at
> org.apache.jena.query.text.TestLuceneWithMultipleThreads$2.run(TestLuceneWithMultipleThreads.java:156)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:744)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)