Could you also post your mods to DocumentsWriter? Eg "doGetThreadState" and "finishDocWithThreadState"?

Or, better yet, post a full patch with a unit test showing the hang?

I think it should be OK to have one thread get a ThreadState and another thread finish indexing the doc with that thread state -- that alone shouldn't cause this hang.

Are you sure you're calling DocumentsWriter.finishDocument, which frees the ThreadState?

Mike

Jagadesh Nomula wrote:


Thanks again for the comments. I am trying to split the IndexWriter.addDocuent into 2 different methods getThreadState() and finishDoc(), the idea is to make them multithreaded for parallelWriter.

I am running into this situation, wherein DocumentsWriter.getThreadState and DocumentsWriter.pauseAllThreads wait indefinitely. A possible explanation, I came up with is "DocumentsWriter requires that getThreadState, finishDoc be called from the same thread. I am screwing up internal state in DocumentsWriter if I am executing them from different threads causing the getThreadState and pauseAllThreads to wait indefinitely". Is it the case ?. The changes are as follows:


  /**
* Adds a document to this index, using the provided analyzer instead of the * value of [EMAIL PROTECTED] #getAnalyzer()}. If the document contains more than * [EMAIL PROTECTED] #setMaxFieldLength(int)} terms for a given field, the remainder are
   * discarded.
   *
   * <p>See [EMAIL PROTECTED] #addDocument(Document)} for details on
   * index and IndexWriter state after an Exception, and
   * flushing/merging temporary free space requirements.</p>
   *
   * @throws CorruptIndexException if the index is corrupt
   * @throws IOException if there is a low-level IO error
   */
public void addDocument(Document doc, Analyzer analyzer) throws CorruptIndexException, IOException {
    finishDoc(getThreadState(doc, analyzer), analyzer);
  }


/**Expert: Get thread state to process doc. Thread state initializes the state to process a given document for * for a thread. A document is processed using this thread state, by DoumentsWriter
    */
DocumentsWriter.ThreadState getThreadState(Document doc, Analyzer analyzer) throws IOException {
    DocumentsWriter.ThreadState state = null;
    ensureOpen();
    try{
      docWriter.setTermVectorTokenSelector(termVectorTokenSelector);
      docWriter.setPositionsTokenSelector(positionsTokenSelector);
      state =  docWriter.doGetThreadState(doc, analyzer, null);
    } finally{
        return state;
    }
  }

/** Expert: Invert a document and then flush the document. Inversion is thread safe and can happen * for multiple threads. Flushing is synchronized and done serially.
    */
void finishDoc(DocumentsWriter.ThreadState state, Analyzer analyzer) throws IOException {
    boolean doFlush = false;
    boolean success = false;
    try{
      doFlush = docWriter.finishDocWithThreadState(state, analyzer);
      success = true;
    } finally{
       if (!success ) {

        if (infoStream != null) {
          message("hit exception adding document");
        }

        synchronized (this) {
          // If docWriter has some aborted files that were
          // never incref'd, then we clean them up here
          if (docWriter != null) {
            final List files = docWriter.abortedFiles();
            if (files != null) {
              deleter.deleteNewFiles(files);
            }
          }
        }
      }
    }

    try {
      if (doFlush)
        flush(true, false);
    } catch (OutOfMemoryError oom) {
      hitOOM = true;
      throw oom;
    }
  }




-----Original Message-----
From: Michael McCandless [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 30, 2008 2:45 AM
To: java-dev@lucene.apache.org
Subject: Re: Deadlock when multi-threading DocumentsWriter


The lock acquire order for all call stacks that lock on these two
classes should be IndexWriter then DocumentsWriter, as is the case
with IndexWriter.doFlush calling DocumentsWriter.pauseAllThreads.  So
you shouldn't hit a thread deadlock.

Also, doFlush is called when it's time to write a new segment (not per
document).  So when it's called, many documents are being flushed to
the Directory.

Mike

Jagadesh Nomula wrote:

> Looks like, I can never run into that situation. Another doc-id
> would not even be assigned before flushing out the current doc.
>
> From: Jagadesh Nomula [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 29, 2008 2:43 PM
> To: java-dev@lucene.apache.org
> Subject: RE: Deadlock when multi-threading DocumentsWriter
>
> Hi Mike,
>
> Thanks for the comments.
>
> Diagnosing, the stack trace the following statements, might run into
> a nested lock.
>
> org
> .apache
> .lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:
> 507)
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2670)
>
> The doFlush method is synchronized on IndexWriter, and
> DocumentsWriter method is synchronized on DocumentsWriter. This
> thread is giving away the lock on DocumentsWriter but still holds a
> lock on IndexWriter.
>
> If documents come to flush state out of order(eg: doc3 getting
> flushed before doc2) then don't we run into deadlock ?. This is not
> happening, so I should be missing something. Any Comments ?
>
> Thanks,
>
> Jagdish
>
> -----Original Message-----
> From: Michael McCandless [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 4:33 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Deadlock when multi-threading DocumentsWriter
>
>
> Can you post a patch with your full changes to DocumentsWriter and
> IndexWriter?
>
> That first thread is trying to flush, but is waiting for all threads
> to leave DocumentsWriter (finish adding docs).  The 2nd thread looks
> like it's waiting for the flush to finish before proceeding.  Are
> there any other threads?
>
> Are you calling DocumentsWriter.finishDocument? That method frees the
> thread state, which is what that first thread is waiting on...
>
> Mike
>
> Jagadesh Nomula wrote:
>
> > Would anyone be having any insight into deadlock issues, when
> > running DocumentsWriter.java from multiple threads ?. I am trying to
> > port ParallelWriter.java code to new codebase of
> > DocumentsWriter.java and IndexWriter. I am doing this by splitting,
> > DocumentsWriter.addDocument call into two methods unsynchronized
> > methods, doGetThreadState and finishDocWithThreadState.
> > doGetThreadState just calls the synchronized getThreadState method
> > and returns a thread state to be used by finishDocWithThreadState,
> > which inverts the document and flushes it.  The code base is
> > semantically equivalent to addDocument method in DocumentsWriter,
> > the only variation being, call to doGetThreadState executed from a
> > synched block in ParallelWriter to maintain the consistency of same
> > doc-ids in parallelWriter.
> >
> > You would imagine that, this code would work without any issues, but
> > it runs into a deadlock. The excerpt of suspicious calls is:
> >
> > == Thread ConnectionThreadGroup-26491.pool-8-thread-1 ===>
> > java.lang.Object.wait(Native Method)
> >         java.lang.Object.wait(Object.java:485)
> >
> > org
> > .apache
> > .lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:
> > 507)
> >
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:
> > 2670)
> > org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:
> > 2660)
> >
> > org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java: 1601)
> >         org.apache.lucene.index.ParallelWriter
> > $ProcessWorker.run(ParallelWriter.java:464)
> >         java.util.concurrent.ThreadPoolExecutor
> > $Worker.runTask(ThreadPoolExecutor.java:885)
> >         java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:907)
> >         java.lang.Thread.run(Thread.java:619)
> >
> >
> > =======================================
> > == Thread ConnectionThreadGroup-26491.pool-3-thread-6 ===>
> > java.lang.Object.wait(Native Method)
> >         java.lang.Object.wait(Object.java:485)
> >
> > org
> > .apache
> > .lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java:
> > 2420)
> >
> > org
> > .apache
> > .lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java:
> > 2532)
> >
> > org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java:
> > 1564)
> >         org.apache.lucene.index.ParallelWriter
> > $ThreadStateWorker.call(ParallelWriter.java:425)
> >         org.apache.lucene.index.ParallelWriter
> > $ThreadStateWorker.call(ParallelWriter.java:405)
> >         java.util.concurrent.FutureTask
> > $Sync.innerRun(FutureTask.java:303)
> >         java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         java.util.concurrent.ThreadPoolExecutor
> > $Worker.runTask(ThreadPoolExecutor.java:885)
> >         java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:907)
> >         java.lang.Thread.run(Thread.java:619)
> >
> > Any info, that I might be overlooking or any comments would be of
> > great help to me in resolving this. Thanks in advance for your help.
> >
> > Jagdish
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to