please note that since i don't have access to the jdk source, the
following ramblings come by way of pure conjecture on my part.

we've got a client/server architecture that eats threads and file
descriptors for lunch.  after applying alan cox's large file array patch,
our server started segfaulting after we pass about 1024 open fds.  without
the patch, it simply receives a "too many open file descriptors" sort of
exception.

without being able to look at the code, all we can do is a bit of black
box testing using strace.  stracing the process, watching polls and
selects, we noticed that the jdk was monitoring most of the fds with poll.
however, all file descriptors that had a timeout associated with them
(using Socket.setSoTimeout) were being monitored separately with select.
a typical strace run looks like:

--- SIGALRM (Alarm clock) ---
poll([{fd=0, events=POLLIN|POLLOUT, revents=POLLOUT}, [snip...], 83, 0) = 81
select(80, [79], NULL, NULL, {0, 0})    = 0 (Timeout)
select(69, [68], NULL, NULL, {0, 0})    = 0 (Timeout)
select(129, [128], NULL, NULL, {0, 0})  = 0 (Timeout)
select(110, [109], NULL, NULL, {0, 0})  = 0 (Timeout)
select(150, [149], NULL, NULL, {0, 0})  = 0 (Timeout)
select(68, [67], NULL, NULL, {0, 0})    = 0 (Timeout)
--- SIGALRM (Alarm clock) ---

all of the fds that are being selected on here have a 20 second timeout
associated with them, *and* they are also present in the poll.  when the
server has more than 1024 open connections and one of the fds is >= 1024
the server segfaults.  i'm assuming this is because the size of fd_set is
1024 and we're walking over random memory when setting/clearing the bits
representing fds >= 1024.

if we disable using timeouts on the sockets, then the server/jdk works
fine; that is, it doesn't call select, only poll.  except that disabling
the timeouts really isn't an option.  hopefully, if this is truly a bug,
then it can be fixed in jdk-1.2.

fyi, there seems to be a similar bug reported on the Bug Parade as BugId
4097406.

on a side note, i'm wondering why the timeout checking is performed using
a separate select for each fd as opposed to coalescing them into a single
check?

on a second side note, i'm wondering why all the fds in the poll are
looking for POLLIN and POLLOUT when many are logically in one state or the 
other?  can't the jdk keep track of whether a thread is attempting a read
or a write and then only look for that state only in the poll request?  it 
seems prudent not to poll for read readiness on STDOUT.

-- 
lantz moore, contigo software                              [EMAIL PROTECTED]


----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to