On Jan 10, 2007, at 12:19 AM, Aleksey Ignatenko wrote:
On 1/10/07, Rana Dasgupta <[EMAIL PROTECTED]> wrote:
On 1/9/07, Weldon Washburn <[EMAIL PROTECTED]> wrote:
>
> On 1/9/07, Gregory Shimansky <[EMAIL PROTECTED]> wrote:
> >> I've tried to analyze MegaSpawn test on windows and here's
what I
found
> >> out.
> >>
> >> OOME is thrown because process virtual size easily gets up to
2Gb.
This
> >> happens at about ~1.5k simultaneously running threads. I
think it
> >> happens because all of virtual process memory is mapped for
thread
> stacks.
> >>
>
> >Good job! I got the same sort of hunch when I looked at the
source
code
> did
> >not have enough time to pin down specifics. The only guidance I
> >found regarding what happens when too many threads are spawned
is the
> >following in the java.lang.Thread reference manual,
"...specifying a
> lower
> >[stacksize] value may allow a greater number of threads to exist
> >concurrently without throwing an OutOfMemoryError (or other
internal
> >error)."
>
> >I think what the above implies is that it is OK for the JVM to
error
and
> >exit if the app tries to create too many threads. If this is
the case,
> it
> >sort of looks like we need to clean up the handling of malloc()
errors
so
> >that the JVM can exit gracefully.
I am not sure that we need to do something about this. The default
initial
stack size on Windows is 1M, and that is the recommended init size
for
real
applications. The fact that our threads start with a larger intial
stack
mapped( default ) than RI is a design issue, it is not a bug. We
could
start
with 2K and create many more threads! Exactly as Gregory points out,
ultimately we will hit virtual memory limits and fail. The reason
the RI
seems to fail less is that the test ends before running out of
virtual
memory.On my 32 bit RHEL Linux box, RI fails almost every time with
MegaSpawn, with an identical OOME error message and stack dump.
We can catch the exception in the test and print a message. But I
am not
very sure what purpose that would serve. A resource exhaustion
exception
is
a fatal exception and the process is hosed, no real app would be
able to
do
anything more at this point. We should not use this test ( which
is not a
real app ) as guidance to tune the initial stack size. My
suggestion is to
lower the test duration so that we can create about a 1000( or
whatever
magic number ) threads at least. That is the stress condition we
should
test
for.
Can we find such a "magic number" for test? The test is highly
dependent on
PC configuration (size of available virtual memory), so someone
with 256Mb
installed only could have OOME problems with lower number of
threads. Looks
like the root problem is in incorrect handling of native memory
exhaustion.
<b>+1</b>
geir
Best regards,
Aleksey.
Thanks,
Rana