jpountz commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1866215329

   > Should I take your PR and check with beasting on Policeman's Ryzen to see 
if I can reproduce with the above command?
   
   That would be great.
   
   I have a Ryzen too, though a AMD Ryzen 9 3900X. I'm not sure what factors 
influence the reproducibility of this bug. The good news is that the new test 
on #12959 fails 100% of the time for me without the fix.
   
   > Is there an explanation why it works better when making the method 
synchronized as suggested by the OpenJ9 people?
   
   My understanding is that the way that the bug manifests looks like this:
    - 2 threads T1 and T2 doing indexing,
    - a `ConcurrentyApproximatePriorityQueue` with concurrency = 2, ie. 2 sub 
queues, initially empty
   
   And then the following sequence of events:
    1 T1 tries to pull a DWPT from the queue, it's empty so 
`DocumentsWriterPerThreadPool` creates a new DWPT. There is now 1 DWPT.
    2 T2 tries to pull a DWPT from the queue, it's empty (because the only one 
created so far is taken by T1) so `DocumentsWriterPerThreadPool` creates a new 
DWPT. There are now 2 DWPTs.
    3 T2 indexes a document
    4 T1 indexes a document
    5 T2 puts the DWPT back into the sub queue at index 1
    6 T1 puts the DWPT back into the sub queue at index 0
    7 T1 starts pulling a DWPT from the queue
    8 T2 pulls the DWPT from the sub queue at index 0
    9 T1 checks the sub queue at index 0, it's empty (since T2 just took it) so 
it keeps going
    10 T2 indexes a document
    11 T2 puts the DWPT back into the sub queue at index 0
    12 T2 pulls the DWPT from the sub queue at index 1
    13 T1 checks the sub queue at index 1, it's empty (since T2 just took it) 
so it keeps going
    14 T1 checked all sub queues and could not get a DWPT, so 
`DocumentsWriterPerThreadPool` creates a one. There are now 3 DWPTs even though 
there are only 2 indexing threads.
   
   Making the method synchronized helps by forcing T2 to wait until T1 is done 
pulling a DWPT from the queue before starting doing the same. The PR tries to 
go a bit more granular to limit contention.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to