[ 
https://issues.apache.org/jira/browse/SOLR-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082166#comment-18082166
 ] 

Chris M. Hostetter commented on SOLR-18252:
-------------------------------------------

Both types of failures seem to stem from race conditions involving: {{private 
List<ConsumerBatch> consumerBatches; // ... = new ArrayList(...)}}
 * it appears a kafka consumer is appending to this list (presumably as it 
polls?)
 * the "main" test thread attempts to iterate over this list, and then makes 
assertions about the number of documents found

 # The first type of failure happens when the "main" thread "finishes" looping 
before the kafka consuming thread is done writing all of the records from kafka
 # The second type of failure is more common, and happens when the kafka 
consuming thread adds to the list in between iterator advancements of the main 
thread.

In general, using an {{ArrayList}} like this (as an information exchange 
between threads) isn't particularly safe – but even if it was, the "race 
condition" of the first thread is still a very real risk: the "main" thread has 
no way of knowing if/when the kafka consuming thread is "done" writtign records.
----
Suggested redesign of this test:
 * replace:
 ** {{private List<ConsumerBatch> consumerBatches}}
 ** with: {{private BlockingQueue<ConsumerBatch> consumerBatches; // ... new 
Linked BlockingQueue(...)}}
 * replace the simple iterator loop in the main test thread with a loop that 
calls {{{}consumerBatches.poll(/* some reasonable timeout */){}}}.
 ** The (successful) loop end condition should be once the expected number of 
total documents is found in all batches
 ** otherwise keep polling (and propagate any TimeoutException as a failure if 
we never receive all the records we expect

> high failure rate from SolrAndKafkaIntegrationTest.testPartitioning 
> --------------------------------------------------------------------
>
>                 Key: SOLR-18252
>                 URL: https://issues.apache.org/jira/browse/SOLR-18252
>             Project: Solr
>          Issue Type: Test
>          Components: module - crossDC
>            Reporter: Chris M. Hostetter
>            Assignee: Andrzej Bialecki
>            Priority: Major
>
> Since it was added a few weeks ago, 
> {{SolrAndKafkaIntegrationTest.testPartitioning}} has had a jenkins failure 
> rate of ~20%.
> based on an ad-hoc review of some of the jenkins logs, the failures seem to 
> fall into 2 categories...
> {noformat}
>    >     java.lang.AssertionError: incorrect count in collection collection1 
> expected:<200> but was:<199>
>    >         at 
> __randomizedtesting.SeedInfo.seed([F418E49D11BF02D5:A809F6571555C802]:0)
>    >         at org.junit.Assert.fail(Assert.java:89)
>    >         at org.junit.Assert.failNotEquals(Assert.java:835)
>    >         at org.junit.Assert.assertEquals(Assert.java:647)
>    >         at 
> org.apache.solr.crossdc.manager.SolrAndKafkaIntegrationTest.lambda$testPartitioning$2(SolrAndKafkaIntegrationTest.java:380)
> {noformat}
> ...and...
> {noformat}
>   2> 54261 INFO  
> (TEST-SolrAndKafkaIntegrationTest.testPartitioning-seed#[710FF61F1D2B5BF0]) 
> [] o.a.s.SolrTestCaseJ4 ###Ending testPartitioning
>    >     java.util.ConcurrentModificationException
>    >         at 
> __randomizedtesting.SeedInfo.seed([710FF61F1D2B5BF0:2D1EE4D519C19127]:0)
>    >         at 
> java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1104)
>    >         at java.base/java.util.ArrayList$Itr.next(ArrayList.java:1058)
>    >         at 
> org.apache.solr.crossdc.manager.SolrAndKafkaIntegrationTest.testPartitioning(SolrAndKafkaIntegrationTest.java:363)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to