Milan,
Thanks for looking into this. I think you should ask a question on the expected
timing semantics and guarantees on net-dev (with maybe a cc to nio-dev).
As for our test. I agree with you that we should simply work a possibility of
early returns into the check.
...
/* The acceptable variation of early returns from timed socket operations. */
private static final long PREMATURE_RETURN = adjustTimeout(100);
...
long elapsed = NANOSECONDS.toMillis(System.nanoTime() - startNanos);
if (elapsed < timeout - PREMATURE_RETURN || elapsed > timeout + TOLERANCE) {
String msg = String.format(
"elapsed=%s, timeout=%s, TOLERANCE=%s, PREMATURE_RETURN=%s",
timeout, timeout, TOLERANCE, PREMATURE_RETURN);
throw new RuntimeException(msg);
}
...
Thoughts?
-Pavel
> On 24 Sep 2019, at 09:12, Milan Mimica <[email protected]> wrote:
>
> Hi Pavel
>
> Wow, I find this awesome. I don't have a Windows machine to play with,
> but I think I may have found something.
> The difference is how Java_sun_nio_ch_Net_poll is implemented. On unix
> it uses poll(2), on Windows it uses select(2). Regarding timeouts,
> poll() has "wait at least" semantics and overruns by design[1], while
> select() on windows has "waits at most" semantics, or how they put
> it[2]: "specifies the maximum time that select should wait before
> returning.". It returns early by design! Is this a known thing? I
> don't think there is much one can do here. It probably makes no sense
> to loop it and wait for time remainder.
> Java's soTimeout does not specify[3] should it wait at least or at
> most the specified timeout, so it's fine I guess. Old, "plain" socket
> impl are not much different.
>
> If the above is correct, should I just add a tolerance for the lower bound?
>
> [1] http://man7.org/linux/man-pages/man2/poll.2.html
> [2]
> https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
> [3]
> https://docs.oracle.com/javase/9/docs/api/java/net/Socket.html#setSoTimeout-int-
>
> On Mon, 23 Sep 2019 at 16:15, Pavel Rappo <[email protected]> wrote:
>>
>> Milan,
>>
>> I'm observing the latest version (.04) of this test failing quite frequently
>> (4/100) on Windows (Windows Server 2012 R2 6.3 (amd64)) machines. The test
>> passes fine on macOS and Linux. Here's the typical output I see in the logs:
>>
>> java.lang.RuntimeException: Query took 4997 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4999 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4995 ms. . Timeout value is 5000
>> java.lang.RuntimeException: Query took 4998 ms. . Timeout value is 5000
>> ...
>>
>> Now, there might be many reasons for that. One of which would be that the
>> DnsClient code is buggy. The other reason would be that the accuracy
>> guaranteed by Windows implementation of `read` is not what we would expect.
>> Would you be able to investigate that?
>>
>> P.S. The good news is that the CSR has been approved:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8230965
>>
>>> On 23 Sep 2019, at 14:20, Milan Mimica <[email protected]> wrote:
>>>
>>> Got it. Thanks Pavel!
>>>
>>>
>>> On Mon, 23 Sep 2019 at 13:37, Pavel Rappo <[email protected]> wrote:
>>>>
>>>> Milan,
>>>>
>>>> How do you check which tests are run? That's what I see in the
>>>> /test-support/jtreg_open_test_jdk_com_sun_jndi_dns_ConfigTests_TcpTimeout_java/com/sun/jndi/dns/ConfigTests/TcpTimeout.jtr
>>>> file after I have run the test locally on my machine:
>>>>
>>>> ----------messages:(5/233)----------
>>>> command: main TcpTimeout
>>>> reason: User specified action: run main TcpTimeout
>>>> Mode: othervm
>>>> Additional options from @modules: --add-modules java.base --add-exports
>>>> java.base/sun.security.util=ALL-UNNAMED
>>>> elapsed time (seconds): 1.751
>>>>
>>>> ...
>>>>
>>>> ----------messages:(5/313)----------
>>>> command: main TcpTimeout -Dcom.sun.jndi.dns.timeout.initial=5000
>>>> reason: User specified action: run main TcpTimeout
>>>> -Dcom.sun.jndi.dns.timeout.initial=5000
>>>> Mode: othervm
>>>> Additional options from @modules: --add-modules java.base --add-exports
>>>> java.base/sun.security.util=ALL-UNNAMED
>>>> elapsed time (seconds): 5.498
>>>>
>>>> ------------------------------------
>>>>
>>>> Which is consistent with what I would expect given the timeout values.
>>>>
>>>> The following output does not tell the full story, just the name of the
>>>> test:
>>>>
>>>> ==============================
>>>> Test summary
>>>> ==============================
>>>> TEST TOTAL PASS FAIL ERROR
>>>> jtreg:open/test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java
>>>> 1 1 0 0
>>>> ==============================
>>>> TEST SUCCESS
>>>>
>>>> -Pavel
>>>>
>>>>> On 20 Sep 2019, at 15:42, Milan Mimica <[email protected]> wrote:
>>>>>
>>>>> Pavel,
>>>>>
>>>>> Here it is: http://cr.openjdk.java.net/~mmimica/8228580/webrev.04/
>>>>> I don't see the test is run twice when I execute "make test
>>>>> TEST=jtreg:test/jdk/com/sun/jndi/dns/ConfigTests/TcpTimeout.java". Am
>>>>> I missing something?
>>>>>
>>>
>>>
>>> --
>>> Milan Mimica
>>
>
>
> --
> Milan Mimica