Re: [drlvm] stress.Mix / MegaSpawn threading bug

Geir Magnusson Jr. Wed, 10 Jan 2007 17:13:45 -0800


On Jan 10, 2007, at 2:13 PM, Weldon Washburn wrote:

Some observations and a simple patch that might just work wellenough for
the current state of DRLVM development.

1)
In some earlier posting, it was mentioned that somehow the virtualmemory
address space is impacted by how much physical memory is in a given
computer. Actually this is not true. The virtual address spaceavailableto the JVM is fixed by the OS. A machine with less phys mem willdo moredisk I/O. In other words "C" malloc() hard limits are set by OSversion
number not by RAM chips.

Talking about VM vs RAM vs whatever is a red herring - we may beported to a machine w/o virtual memory. What matters is that whenmalloc() returns null, we do something smart. At least, do nothingharmful.

2)
Why not simply hard code DRLVM to throw an OOME whenever there aremore than1K threads running? I think Rana first suggested this approach.My guessis that 1K threads is good enough to run lots of interestingworkloads. Myguess is that common versions of WinXP and Linux will handle the Cmalloc()load of 1K threads successfully. If not, how about trying 512threads?

Because this is picking up the rug, and sweeping all the dirtunderneath it. The core problem isn't that we try too many threads,but the code wasn't written defensively. Putting an artificial limiton # of threads just means that we'll hit it somewhere else, in someother resource usage.


I think we should fix it.

There seem to be some basic things we can do, like reduce the stacksize on windows from the terabyte or whatever it is now, to thenumber that our dear, esteemed colleague from IBM claims is perfectlysuitable for production use.

That too doesn't solve the problem, but it certainly fixes a problemwe are now aware of - our stack size is too big.... :)

3)
The above does not deal with the general architecture question ofhandling Cmalloc failures. This is far harder to solve. Note that solvingthe big
question will also require far more extensive regression tests than
MegaSpawn. However, it does fix DRLVM so that it does not crash/burn onthreads overload. This, in turn, gives us time to fix the realunderlying
problem(s) with C malloc.

From this perspective, I don't mind as much, as long as we fix thestack sizes. But a part of me wants to say no, because there'snothing compelling us to actually fix the problem. :)

Maybe we set the number to something crippling, like 10 or something,which will then motivate anyone who wants to o something useful withthe VM


I'm just always nervous about things like this...

geir



On 1/10/07, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:



On Jan 10, 2007, at 8:51 AM, Gregory Shimansky wrote:

> Geir Magnusson Jr. wrote:
>> The big thing for me is ensuring that we can drive the VM to the
>> limit, and it maintains internal integrity, so applications that
>> are designed to gracefully deal with resource exhaustion can do so
>> w/ confidence that the VM isn't about to crume out from
>> underneath them.
>
> I agree with Geir that we should try to handle out of C heap
> condition gracefully. The problem is that there is no clearly
> defined contract for many functions that use memory allocation
> about what to do in case of out of memory condition.
>
> To maintain integrity all VM functions which allocate memory from C
> heap should return gracefully all the way up the stack until they
> hit Java code that called them and then OOME exception shall be
> seen by the Java code. It is not an easy task because all code
> paths should support it, including JIT and GC.
>

Agreed.  But certainly worth striving for :)

geir



--
Weldon Washburn
Intel Enterprise Solutions Software Division

Re: [drlvm] stress.Mix / MegaSpawn threading bug

Reply via email to