[
https://issues.apache.org/jira/browse/SOLR-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Drob resolved SOLR-14524.
------------------------------
Fix Version/s: master (9.0)
Assignee: Mike Drob
Resolution: Fixed
Thanks for the fix, [~murblanc]!
> Harden MultiThreadedOCPTest
> ---------------------------
>
> Key: SOLR-14524
> URL: https://issues.apache.org/jira/browse/SOLR-14524
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: master (9.0)
> Reporter: Ilan Ginzburg
> Assignee: Mike Drob
> Priority: Minor
> Labels: test
> Fix For: master (9.0)
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> {{MultiThreadedOCPTest.test()}} fails occasionally in Jenkins because of
> timing of tasks enqueue to the Collection API queue.
> This test in {{testFillWorkQueue()}} enqueues a large number of tasks (115,
> more than the 100 Collection API parallel executors) to the Collection API
> queue for a collection COLL_A, then observes a short delay and enqueues a
> task for another collection COLL_B.
> It verifies that the COLL_B task (that does not require the same lock as the
> COLL_A tasks) completes before the third COLL_A task.
> Test failures happen because when enqueues are slowed down enough, the first
> 3 tasks on COLL_A complete even before the COLL_B task gets enqueued!
> In one sample failed Jenkins test execution, the COLL_B task enqueue happened
> 1275ms after the enqueue of the first COLL_A, leaving plenty of time for a
> few (and possibly all) COLL_A tasks to complete.
> Fix will be along the lines of:
> * Make the “blocking” COLL_A task longer to execute (currently 1 second) to
> compensate for slow enqueues.
> * Verify the COLL_B task (a 1ms task) finishes before the long running
> COLL_A task does. This would be a good indication that even though the
> collection queue was filled with tasks waiting for a busy lock, a non
> competing task was picked and executed right away.
> * Delay the enqueue of the COLL_B task to the end of processing of the first
> COLL_A task. This would guarantee that COLL_B is enqueued once at least some
> COLL_A tasks started processing at the Overseer. Possibly also verify that
> the long running task of COLL_A didn't finish execution yet when the COLL_B
> task is enqueued...
> * It might be possible to set a (very) long duration for the slow task of
> COLL_A (to be less vulnerable to execution delays) without requiring the test
> to wait for that task to complete, but only wait for the COLL_B task to
> complete (so the test doesn't run for too long).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]