Steve White <[EMAIL PROTECTED]> schrieb am 05/08/2008 02:34:48 PM:
> Jan,
>
> On 8.05.08, Jan Ploski wrote:
> > [EMAIL PROTECTED] schrieb am 05/08/2008 12:36:23 PM:
> >
> > > > > 2) generic control of RAM-per-process
> > > >
> > > > We would like to move to JSDL where I would think this would be
> > > > covered, but after scanning, it looks like it isn't. jsdl posix
has
> > > > MemoryLimit, but that is for the job and not for each process in
the
> > > > job. So I don't think even JSDL provides this.
> >
> > > > 8.1.14.1 Definition
> > > > This element is a positive integer that describes the maximum
amount
> > > > of physical memory that the job should use when executing.
> > > > The amount is given in bytes. If this is not present then the
> > > > consuming system MAY choose its default value10.
> > > >
> > >
> > > This would suffice if implemented properly.
> > >
> > > The memory per process would be
> > > mem_per_process = MemoryLimit / count
> > >
> > > The number of cores to assign per node on cluster with multi-core
nodes
> > > could be calculated as
> > >
> > > available_RAM_per_node / mem_per_process
> >
> > Steve,
> >
> > I'm not sure whether it would be a correct implementation or another
quick
> > hack. JSDL says nothing about the relationship of a POSIXApplication
to a
> > group of processes launched by MPI. As a matter of fact, it is
remarkably
> > silent about the relationships between jobs and processes and says
nothing
> > about relationships among processes. Maybe noone familiar with MPI
> > participated in writing JSDL or maybe - more likely - the tough issue
was
> > put off "until later".
> >
> These are just suggestions. As to correctness (as you point out) the
> question is "according to what"? My intent here is to help the Globus
> developers to find a solution, by explaining the need.
>
> The need is pretty clear, although some of the details are fuzzy.
>
> These days, a cluster user can effectively quadruple their memory
> per process, say by specifying the processes per node. For certain
> applications, this can be crucial.
>
> > Anyway, one can reason about the exectuion of an MPI application as a
> > scenario involving the execution of n instances of a POSIXApplication.
> > This interpretation would fit quite well the actual MPI runner
> > implementations whose job is always to launch n processes of the
> > user-specified executable on m <= n machines, using whatever
> > system-specific means are available. Therefore, I would suggest that
if
> > JSDL is used, the MemoryLimit in the POSIXApplication element is not
some
> > aggregate "physical memory that the job should use when executing" to
be
> > divided among processes using a rule of thumb. Instead, treat it as a
> > specification which applies to each single process of a multi-process
job;
> > it *is*, after all, a description of an executable POSIX process. For
> > maximum flexibility, one should probably be able to specify a
different
> > POSIXApplication element for each MPI process.
> >
> > Apart from these considerations, I am not sure if your "RAM per
process"
> > requirement is covered by the intent of MemoryLimit. MemoryLimit
basically
> > translates to "ulimit -m" in bash (JSDL authors also forgot to mention
> > whether the hard or soft limit was meant). Is this what you are
looking
> > for? Or do you want to guarantee that a certain amount of memory can
be
> > allocated by a process without incurring paging activity during its
entire
> > execution? Perhaps both?
> >
> The user wants to set a minimum amount of RAM available to each process.
> That might be specified in more than one way, in principle.
Ok, that would be the second of my options above.
> The issues on conventional clusters are different from those on
> shared-memory machines, but the request is the same.
>
> There are at least two uses of these parameters. They are unfortunately
> not clearly stated.
>
> One is a contract with a resource allocator/scheduler. For instance, a
> maximum memory requirement, like a maximum walltime requirement, can be
> used by the allocator to efficiently and effectively allocate resources
for
> and schedule the job.
That would be a promise on behalf of a job not to exceed a certain limit.
That's what I meant with my first option, and that's what ulimit in bash
(more generally, the setrlimit POSIX system call) and MemoryLimit in JSDL
were intended for.
> Another is what you call a "hard limit": my user wants a certain amount
> of RAM per process, no less.
No, my distinction of 'soft limit' and 'hard limit' is also related to the
POSIX case above. A soft limit may be raised by a process by explicitly
using the setrlimit system call. A hard limit can only be raised by the
administrator.
> This is like the requirement of number of
> processes. In our case, the job should fail immediately (with an
> informative message) if that requirement can't be satisfied.
That would be indeed a requirement and corresponding to your desired
"minimum amount of RAM available to each process".
> The Globus WS-GRAM documentation is completely silent on the intended
> purpose of such parameters as minMemory. Better documentation alone
> would help a great deal.
Indeed, the description of "purpose" is lacking. minMemory is a good
example:
"Explicitly set the minimum amount of memory for a single execution of the
executable. The units is in Megabytes. The value will go through an atoi()
conversion in order to get an integer. If the GRAM scheduler cannot set
minMemory, then an error will be returned."
To explain the "purpose", one has to rely on some "causal model", where
user actions (of setting that particular parameter) affect the
user-observable states of some user-perceivable entities... so that the
user can effectively simulate the effect of an action in his mind. To
enable this kind of reasoning, up-front crisp definitions of new concepts,
their relationships, and connections to older concepts already familiar to
the user (such as POSIX processes?) would be necessary. There is a
rudimentary attempt at it in the WS-GRAM documentation ("Key Concepts"),
but it fails. The document is littered with importantly sounding
abstractions from developer jargon without ever striking connections to
the user's prior knowledge. Fundamental concepts such as "job" are invoked
without ever being explained. The silent assumption is that the users have
experience with the same batch processing software as the author and will
intuitively associate the same meaning with that concept. But as we see
just avoiding talking about implementation details doesn't mean that these
implementation details become irrelevant to users.
Apart from that: the atoi() remark from the description is clearly
out-of-scope. The reference to a GRAM scheduler not being able to "set
minMemory" - why and when would that condition occur and what could a user
do about it? What is guaranteed by Globus in this context?
> Maybe I'll make another bug report about documentation.
You'd probably need to describe a desired correction in a bug report. I
think this is non-trivial, as the issue is not spelling or unfortunate
grammar. Here, the intended and actual meanings of concepts are concerned,
and to clarify them you'd have to recursively clarify many other concepts.
I could imagine a whole project around that... After all, this is more or
less what standardization/specification efforts are about.
> The amount of memory used by a scientific process is typically quite
well
> known by its user. It is not at all magical, and is something they
> regularly calculate. There may be cases where some experimentation is
> required, but then they still know the value.
Yes, what they want is a guarantee from the system "that the given amount
of memory can be allocated by a process without degrading performance".
Regards,
Jan Ploski