On Jan 10, 2007, at 2:13 PM, Weldon Washburn wrote:

Some observations and a simple patch that might just work well enough for
the current state of DRLVM development.

1)
In some earlier posting, it was mentioned that somehow the virtual memory
address space is impacted by how much physical memory is in a given
computer. Actually this is not true. The virtual address space available to the JVM is fixed by the OS. A machine with less phys mem will do more disk I/O. In other words "C" malloc() hard limits are set by OS version
number not by RAM chips.


Talking about VM vs RAM vs whatever is a red herring - we may be ported to a machine w/o virtual memory. What matters is that when malloc() returns null, we do something smart. At least, do nothing harmful.


2)
Why not simply hard code DRLVM to throw an OOME whenever there are more than 1K threads running? I think Rana first suggested this approach. My guess is that 1K threads is good enough to run lots of interesting workloads. My guess is that common versions of WinXP and Linux will handle the C malloc() load of 1K threads successfully. If not, how about trying 512 threads?

Because this is picking up the rug, and sweeping all the dirt underneath it. The core problem isn't that we try too many threads, but the code wasn't written defensively. Putting an artificial limit on # of threads just means that we'll hit it somewhere else, in some other resource usage.

I think we should fix it.

There seem to be some basic things we can do, like reduce the stack size on windows from the terabyte or whatever it is now, to the number that our dear, esteemed colleague from IBM claims is perfectly suitable for production use.

That too doesn't solve the problem, but it certainly fixes a problem we are now aware of - our stack size is too big.... :)


3)
The above does not deal with the general architecture question of handling C malloc failures. This is far harder to solve. Note that solving the big
question will also require far more extensive regression tests than
MegaSpawn. However, it does fix DRLVM so that it does not crash/ burn on threads overload. This, in turn, gives us time to fix the real underlying
problem(s) with C malloc.

From this perspective, I don't mind as much, as long as we fix the stack sizes. But a part of me wants to say no, because there's nothing compelling us to actually fix the problem. :)

Maybe we set the number to something crippling, like 10 or something, which will then motivate anyone who wants to o something useful with the VM

I'm just always nervous about things like this...

geir




On 1/10/07, Geir Magnusson Jr. <[EMAIL PROTECTED]> wrote:


On Jan 10, 2007, at 8:51 AM, Gregory Shimansky wrote:

> Geir Magnusson Jr. wrote:
>> The big thing for me is ensuring that we can drive the VM to the
>> limit, and it maintains internal integrity, so applications that
>> are designed to gracefully deal with resource exhaustion can do so
>> w/ confidence that the VM isn't about to crume out from
>> underneath them.
>
> I agree with Geir that we should try to handle out of C heap
> condition gracefully. The problem is that there is no clearly
> defined contract for many functions that use memory allocation
> about what to do in case of out of memory condition.
>
> To maintain integrity all VM functions which allocate memory from C
> heap should return gracefully all the way up the stack until they
> hit Java code that called them and then OOME exception shall be
> seen by the Java code. It is not an easy task because all code
> paths should support it, including JIT and GC.
>

Agreed.  But certainly worth striving for :)

geir





--
Weldon Washburn
Intel Enterprise Solutions Software Division

Reply via email to