Re: RFR 8021820: Number of opened files used in select() is limited to 1024 [macosx]

Aleksej Efimov Wed, 07 Aug 2013 06:04:04 -0700

Stuart, thank you for you comments, responses are below.
New webrev:

http://cr.openjdk.java.net/~aefimov/8021820/webrev.02/<http://cr.openjdk.java.net/%7Eaefimov/8021820/webrev.02/>



-Aleksej

On 08/06/2013 05:14 AM, Stuart Marks wrote:

Hi Aleksej,
Thanks for the update. I took a look at the revised test, and thereare still some issues. (I didn't look at the build changes.)
1) System-specific resource limits.
I think the biggest issue is resource limits on the number of openfiles per process that might vary from system to system. On my Ubuntusystem, the hard limit on the number of open files is 1,024. The testopens 1,023 files and then one more for the socket. Unfortunately theJVM and jtreg have several files open already, and the test crashesbefore the openFiles() method completes.
(Oddly it crashes with a NoClassDefFoundError from the main thread'suncaught exception handler, and then the test reports that it passed!Placing a try/catch of Throwable in main() or openFiles() doesn'tcatch this error. I have no explanation for this. When run standalone-- i.e., not from jtreg -- the test throws FileNotFoundException (toomany open files) from openFiles(), which is expected.)
On my Mac (10.7.5) the soft limit is 256 files, but the hard limit isunlimited. The test succeeds in opening all its files but failsbecause of the select() bug you're fixing. (This is expected; I didn'trebuild my JDK with your patch.) I guess the soft limit doesn't doanything on Mac.
Amazingly, the test passed fine on both Windows XP and Windows 8.
I'm not entirely sure what to do about resource limits. Since the testis able to open >1024 files on Mac, Windows, and possibly otherLinuxes, it seems reasonable to continue with this approach. If it'spossible to catch the error that occurs if the test cannot open itsinitial 1,024 files, perhaps it should do this, log a messageindicating what happened, and consider the test to have passed. I'mmystified by the uncaught/uncatchable NoClassDefFoundError though.

I wonder if this is a question of test environment required for JTREGtests: if we'll execute JTREG tests with low value assigned to fd hardlimit (for example 10) we'll see a lot of unrelated test failures. So, Isuggest that we can assume that there is no hard limits set (or at leastdefault ones, i.e. default fd limit on Ubuntu is 4096) on test machine.But we should consider test as Failed if test failed to prepare it'senvironment because of some external limitations. The JTREG doesn't meetthis criteria (log test as PASS and prints incorrect Exception). Toillustrate it I have repeated your experiments on ubuntu linux: set fdhard limit to 1024 (ulimit -Hn 1024) and got this error by manual run oftest:Exception in thread "main" java.io.FileNotFoundException: testfile (Toomany open files)

    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:128)
    at SelectFdsLimit.openFiles(SelectFdsLimit.java:63)
    at SelectFdsLimit.main(SelectFdsLimit.java:81)

Seems correct to me.
An by JTREG (also with hard limit set to 1024):
----------messages:(3/123)----------
command: main SelectFdsLimit
reason: User specified action: run main/othervm SelectFdsLimit
elapsed time (seconds): 0.168
----------System.out:(0/0)----------
----------System.err:(5/250)----------

Exception: java.lang.NoClassDefFoundError thrown from theUncaughtExceptionHandler in thread "MainThread"

STATUS:Passed.
Exception in thread "main"

Exception: java.lang.NoClassDefFoundError thrown from theUncaughtExceptionHandler in thread "main"

result: Passed. Execution successful


test result: Passed. Execution successful

The results are identical to results mentioned by you. It seems to methat jtreg doesn't correctly processes such test error (at least itshouldn't be considered as Pass). And I suggest two ways of resolving it:1. If we don't have official limitations (or default) on what resourcestest can use then we shouldn't do any modifications to test.2. If there is some limitations that we should honor then we'll need tofigure out what to do with NoClassDefFoundError exception in JTREG.

2) Closing files.
If an exception is thrown while opening the initial set of files, orsometime during the closing process, the test can still leak files.
One approach would be to keep a data structure representing thecurrent set of open files, and close them all in a finally-blockaround all the test logic, and making sure that exceptions from theclose() call are caught and do not prevent the rest of the files frombeing closed.
This seems like a lot of work. Perhaps a more effective approach wouldbe to run the test in "othervm" mode, as follows:
    @run main/othervm SelectFdsLimit
This will cause the test to run in a dedicated JVM, so all files willbe closed automatically when it exits. (It would be good to add acomment explaining the need for othervm, if you do this.)

main/othervm and comments were added.

3) Port number for sockets.
It's fairly common for tests to fail occasionally because they usesome constant port number that sometimes happens to be in use at thesame time by another process on the system. I have to say, 8080 is apretty common port number. :-)
For purposes of this test, you can let the system assign a port. Justuse:
    new ServerSocket(0)

Completely agree that 8080 port - bad port for testing =). Changed to 0.

4) Style.
It's probably possible to use the same File object for the test file,instead of creating new File objects repeatedly.

Agree and corrected.

It might be nice to add a comment explaining the logic of the test,that SocketTimeoutException is expected, and that failure will beindicated if the accept() throws SocketException caused by theunderlying mishandling of large fds by select().

Comments were added.

Thanks,

s'marks



On 8/5/13 4:47 AM, Aleksej Efimov wrote:
Alan, Tim,

I have addressed your comments and as a result - new webrev:
http://cr.openjdk.java.net/~aefimov/8021820/webrev.01

The list of changes:
1. The connection to Oracle site is removed (it's not internal, butanyway it'sbetter not to rely on availability of external resource in test). Incurrentversion a server socket is created and accept() method is used forbug disclosure.2. The cleanup method is added for closing file streams. The JTREGsuccessfully
cleaned-up on windows after this modification.
3. common/autoconf/toolchain.m4 untouched, but 'bash
common/autoconf/autogen.sh' was executed to updategenerated-configure.sh.
Aleksej


On 07/31/2013 06:35 PM, Tim Bell wrote:
Aleksej, Alan
The change in common/autoconf/toolchain.m4 looks correct to me, andI think
that is a good place to have it.  Remember to run 'bash
common/autoconf/autogen.sh' and check in the generated-configure.shfiles as
part of the changeset.

I didn't look at the test case, but I think Alan has some good points.

Tim

On 07/31/13 06:45 AM, Alan Bateman wrote:
On 31/07/2013 05:18, Aleksej Efimov wrote:
Hi,
Can I have a review for the following problem:
The MACOSX JDK (more precisely - the java.net classes) uses theselect()system call to wait for different events on sockets fds. And thedefaultbehaviour for select() on Darwin is to fail when fdset containsthe fd withid greater than FDSET_SIZE(=1024). Test case in webrev illustratesthis
behavior.
There is at least one solution for it: use -D_DARWIN_UNLIMITED_SELECT
compilation flag for all macosx sources: this won't affect otherparts of
JDK because they are not using select().
Currently, I have added this compilation flag to
common/autoconf/generated-configure.sh and
common/autoconf/generated-configure.sh. I wonder, if there is abetter
place where I can put this flag?

The webrev: http://cr.openjdk.java.net/~aefimov/8021820/webrev.00/
BUG: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8021820
Thanks for looking into this one. The build changes look okay to mebut it's
probably best that someone on build-dev agree to those.
Michael McMahon can probably explain why the net code is usingselect fortimed read/accept (I have a vague recollection of there being anissue withpoll due to the way that it is implemented on kqueue with theresult that it
had to be changed to use select).

I think the test needs re-work. It looks to me that the it attempts to
connect to an Oracle internal site so that's not going to workeverywhere.In general we don't want the tests to be dependent on hosts thatmay or maynot exist (we had tests that used to this in the past but theycaused a lotof grief). It also looks like the test doesn't close the 1023 filesthat itopens at the start and so I assume this test will always fail onWindows
when jtreg tries to clean-up.

-Alan.

Re: RFR 8021820: Number of opened files used in select() is limited to 1024 [macosx]

Reply via email to