[ 
https://issues.apache.org/jira/browse/SOLR-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723289#comment-16723289
 ] 

Hoss Man edited comment on SOLR-13076 at 12/17/18 11:06 PM:
------------------------------------------------------------

Once thing we need to watch out for here is that this "fix" may break some of 
the "waitFor.." type code used in tests, where the point of the "wait" logic is 
to give competing threads a chance to finish executing some background logic, 
but if the timesource being used in the TimeOut is based on the _simulated_ 
TimeSource then they might not wait long enoug.

-Example: {{CloudTestUtils.waitForState}} sets up a 90 second timeout to said 
for the specified cluster state to exist, using 
{{cloudManager.getTimeSource()}}  -- prior to [~ab]'s fix, that timeout was 
going to use thread.sleep and was willing to ultimately wait for a "real" 90 
seconds (from the perspective of the test host) but now it will only wait a 
"simulated" 90 seconds, which may not be enough time for a jenkins box with low 
CPU counts.-

Perhaps we need a thorough audit any uses of TimeOut in the _test_ code to 
sanity check if they should be using the "clusters" concept of time (ie: 
waiting for triggers to fire) vs using the "real" concept of time (ie: waiting 
for threads to run)

----

*UPDATE:* I just realized i missread this comment, and it doesn't affect 
{{CloudTestUtils.waitForState}} the way i thought at all ... i still have some 
concerns that this "fix" might expose other bugs in tests that are using 
{{waitFor}} on the cluster's TimeSource, when they should be using NANO_TIME, 
to give background threads time to process things ... but we'll just have to 
see what shakes out.


was (Author: hossman):
Once thing we need to watch out for here is that this "fix" may break some of 
the "waitFor.." type code used in tests, where the point of the "wait" logic is 
to give competing threads a chance to finish executing some background logic, 
but if the timesource being used in the TimeOut is based on the _simulated_ 
TimeSource then they might not wait long enoug.

Example: {{CloudTestUtils.waitForState}} sets up a 90 second timeout to said 
for the specified cluster state to exist, using 
{{cloudManager.getTimeSource()}}  -- prior to [~ab]'s fix, that timeout was 
going to use thread.sleep and was willing to ultimately wait for a "real" 90 
seconds (from the perspective of the test host) but now it will only wait a 
"simulated" 90 seconds, which may not be enough time for a jenkins box with low 
CPU counts.

Perhaps we need a thorough audit any uses of TimeOut in the _test_ code to 
sanity check if they should be using the "clusters" concept of time (ie: 
waiting for triggers to fire) vs using the "real" concept of time (ie: waiting 
for threads to run)

> TimeOut breaks the simulation framework
> ---------------------------------------
>
>                 Key: SOLR-13076
>                 URL: https://issues.apache.org/jira/browse/SOLR-13076
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>    Affects Versions: 7.6, master (8.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Minor
>             Fix For: master (8.0), 7.7
>
>
> {{TimeOut}} uses actual {{Thread.sleep}} in its {{waitFor}} method instead of 
> using {{TimeSource.sleep}}. This breaks the simulation framework, which often 
> uses a non-real time {{TimeSource}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to