[ 
https://issues.apache.org/jira/browse/LUCENE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877850#comment-13877850
 ] 

Luis Filipe Nassif commented on LUCENE-5407:
--------------------------------------------

Thank you for taking a look. Yes, Lucene indexes concurrently really well if I 
feed the docs to parallel threads for indexing with the same indexWriter. Even 
if some indexing thread becomes blocked while reading some doc (eg: slow 
network, infinite loop while extracting text from corrupted doc), the other 
indexing threads continue doing the work. So I thought this kind of thread 
chaining shoud also work fine... Why is Thread-1 blocked here if in a parallel 
setup the indexing threads do not block if one of them blocks while reading 
some doc? 

The use case for this design is that we have terabytes of data to process from 
slow 7200rpm discs and we don't want to pre-process, process and post-process 
the docs. It is better to read each doc only once and do all the processing 
with it, taking advantage of disc and system cache, for example.

> Deadlock? while indexing reader fields in cascaded threads
> ----------------------------------------------------------
>
>                 Key: LUCENE-5407
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5407
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.6
>         Environment: Windows 7 64 bits, JRE 1.7.0_25 64 bits
>            Reporter: Luis Filipe Nassif
>         Attachments: Test.java, thread_dump.txt
>
>
> Apparently I found a deadlock problem with IndexWriter using Reader Fields in 
> a cascaded thread design to add documents (I am working on an application 
> integrating Tika, which has the capability to add embedded documents to the 
> index as independent documents as they are found). The attached code 
> illustrates the problem. Sometimes it stops processing, at least one of the 
> threads remains in WAITING state. It must be executed no more than 5 times in 
> my environment to trigger the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to