[
https://issues.apache.org/jira/browse/LUCENE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877850#comment-13877850
]
Luis Filipe Nassif commented on LUCENE-5407:
--------------------------------------------
Thank you for taking a look. Yes, Lucene indexes concurrently really well if I
feed the docs to parallel threads for indexing with the same indexWriter. Even
if some indexing thread becomes blocked while reading some doc (eg: slow
network, infinite loop while extracting text from corrupted doc), the other
indexing threads continue doing the work. So I thought this kind of thread
chaining shoud also work fine... Why is Thread-1 blocked here if in a parallel
setup the indexing threads do not block if one of them blocks while reading
some doc?
The use case for this design is that we have terabytes of data to process from
slow 7200rpm discs and we don't want to pre-process, process and post-process
the docs. It is better to read each doc only once and do all the processing
with it, taking advantage of disc and system cache, for example.
> Deadlock? while indexing reader fields in cascaded threads
> ----------------------------------------------------------
>
> Key: LUCENE-5407
> URL: https://issues.apache.org/jira/browse/LUCENE-5407
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.6
> Environment: Windows 7 64 bits, JRE 1.7.0_25 64 bits
> Reporter: Luis Filipe Nassif
> Attachments: Test.java, thread_dump.txt
>
>
> Apparently I found a deadlock problem with IndexWriter using Reader Fields in
> a cascaded thread design to add documents (I am working on an application
> integrating Tika, which has the capability to add embedded documents to the
> index as independent documents as they are found). The attached code
> illustrates the problem. Sometimes it stops processing, at least one of the
> threads remains in WAITING state. It must be executed no more than 5 times in
> my environment to trigger the problem.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]