Hi Eric, Martin,
I'm fine with the re-write. I'm not sure why the re-ordering of y3 will
change the
behavior of the test but it will provide more debugging info.
Roger
On 6/6/2014 9:32 PM, Martin Buchholz wrote:
If you don't want to go with my rewrite, you can conservatively just
check in a 10x increase in all the constant durations and see whether
the flakiness goes away.
On Thu, Jun 5, 2014 at 9:46 PM, Martin Buchholz <marti...@google.com
<mailto:marti...@google.com>> wrote:
As with David, the cause of the failure is mystifying.
How can things fail when we stay below the timeout value of 500ms?
There's a bug either in Timer or my own understanding of what
should be happening.
Anyways, raising the timeout value (as I have done in my minor
rewrite) seems prudent. Fortunately, we can write this test in a
way that doesn't require actually waiting for the timeout to elapse.
On Wed, Jun 4, 2014 at 1:23 PM, roger riggs
<roger.ri...@oracle.com <mailto:roger.ri...@oracle.com>> wrote:
Hi Martin, Eric,
Of several hundred failures of this test, most were done in a
JRE run with
-Xcomp set. A few failures occurred with -Xmixed, none with
-Xint.
The printed "elapsed" times (not normalized to hardware or OS)
range from
24 to 132 (ms) with most falling into several buckets in the
30's, 40's, 50's and 70's.
I don't spot anything in the Timer.mainLoop code that might
break when highly
optimized but that's one possibility.
Roger
On 6/4/2014 3:25 PM, Martin Buchholz wrote:
Tests for Timer are inherently timing (!) dependent.
It's reasonable for tests to assume that:
- reasonable events like creating a thread and executing a
simple task
should complete in less than, say 2500ms.
- system clock will not change by a significant amount (>
1 sec) during the
test. Yes, that means Timer tests are likely to fail
during daylight
saving time switchover - we can live with that. (we could
even try to fix
that, by detecting deviations between clock time and
elapsed time, but
probably not worth it)
Can you detect any real-world unreliability in my latest
version of the
test, not counting daylight saving time switch?
I continue to resist your efforts to "fix" the test by
removing chances for
the SUT code to go wrong.
On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang
<yiming.w...@oracle.com <mailto:yiming.w...@oracle.com>>
wrote:
Hi Martin,
Thanks for explanation, now I can understand why you
set the DELAY_MS to
100 seconds, it is true that it prevents failure on a
slow host, however, i
still have some concerns.
Because the test tests to schedule tasks at the time
in the past, so all
13 tasks should be executed immediately and finished
within a short time.
If set the elapsed time limitation to 50s
(DELAY_MS/2), it seems that the
timer have plenty of time to finish tasks, so whether
it causes above test
point lost.
Back to the original test, i think it should be a test
stabilization
issue, because the original test assumes that the
timer should be cancelled
within < 1 second before the 14th task is called. this
assumption may not
be guaranteed due to 2 reasons:
1. if test is executed in jtreg concurrent mode on a
slow host.
2. the system clock of virtual machine may not be
accurate (maybe faster
than physical).
To support the point, i changed the test as attached
to print the
execution time to see whether the timer behaves
expected as the API
document described. the result is as expected.
The unrepeated task executed immediately: [1401855509336]
The repeated task executed immediately and repeated
per 1 second:
[1401855509337, 1401855510337, 1401855511338]
The fixed-rate task executed immediately and catch up
the delay:
[1401855509338, 1401855509338, 1401855509338,
1401855509338, 1401855509338,
1401855509338, 1401855509338, 1401855509338,
1401855509338, 1401855509338,
1401855509338, 1401855509836, 1401855510836]
Thanks,
Eric
On 2014/6/4 9:16, Martin Buchholz wrote:
On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang
<yiming.w...@oracle.com
<mailto:yiming.w...@oracle.com>> wrote:
Hi Martin,
To sleep(1000) is not enough to reproduce the
failure, because it is much
lower than the period DELAY_MS (10*1000) of the
repeated task created by
"scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
Try sleep(DELAY_MS), the failure can be reproduced.
Well sure, then the task is rescheduled, so I expect
it to fail in this
case.
But in my version I had set DELAY_MS to 100 seconds.
The point of
extending the DELAY_MS is to prevent flaky failure on
a slow machine.
Again, how do we know that this test hasn't found a
Timer bug?
I still can't reproduce it.