Forwarding here in case there are some GRAM users have not seen this
yet.
Cheers,
Stu
Begin forwarded message:
From: Stuart Martin <smar...@mcs.anl.gov>
Date: May 12, 2009 9:08:07 AM CDT
To: GRAM developer <gram-...@globus.org>
Cc: Stuart Martin <smar...@mcs.anl.gov>
Subject: GRAM V5 Alpha (scalability) Release
GRAM2 (aka Pre-WS GRAM) users:
We are happy to make available a new GRAM V5 alpha quality version
for testing. http://dev.globus.org/wiki/GRAM/Scalability_Alpha_20090504
GRAM5 is built from the GT4 GRAM2 code base. GRAM5 removes some
features and alters some behaviors, though it remains protocol-
compatible with existing GRAM2 deployments. Specifically, file
streaming has been replaced by end-of-job file staging
(transparently to the user), and MPICH-G2 multijob coordination is
removed from the service. Preliminary compatibility testing has
been successful with existing GRAM2 clients to the new GRAM5
service: globusrun, COG-jglobus, and Condor-G clients submitting and
monitoring jobs. For Condor-G and GRAM5, we recommend not using the
grid monitor.
This GRAM5 alpha release improves the scalability of the GRAM system
by reducing the CPU and memory use. In addition to scalability-
related changes, this alpha includes fixes for bugs in GSSAPI, GASS
Cache, and GRAM that impact the performance and reliability of GRAM.
This is the first public alpha release of GRAM5. It is not
recommended for production environments, though we do request
feedback and bug reports.
Significant Design Changes
---------------------------------
There are 2 significant modifications that account for the
scalability improvements in GRAM5:
1) All job management and processing is done with a single Job
Manager process per user (instead of one per job). This coupled
with throttling the amount of work each user's job manager will do,
makes it so the system load average is independent from the number
of jobs in the system. This is done without sacrificing performance.
2) Monitoring of the LRM jobs is done using the scheduler event
generator (SEG). The SEG has been used for a number of years in WS
GRAM for scalability reasons. The SEG is more efficient than
individual job querying using the LRM's CLI. And it is also more
efficient than using condor-g's grid-monitor approach.
A complete list of all changes is here: http://tinyurl.com/pyhfy5
Performance and Scalability testing
-----------------------------------------
In addition to functional and compatibility testing, we ran series
of performance and scalability tests with GRAM5 with our own java
throughput testing client. And last, we ran a test using condor-g
using the same job load/scenario to both GRAM2 and GRAM5. All test
results are here: http://dev.globus.org/wiki/GRAM5_Scalability_Results
Observations from the 5-client test, http://tinyurl.com/qklpco:
In the 1 hour test period, 4944 job were submitted, with more than
4500 pending in the PBS queue at once. This demonstrates that the
number of jobs being monitored by GRAM5 does not adversely affect
its performance. This also shows that a naive client implementation
with 5 x 50 concurrent threads does not cause the high load average
on the cluster head node often seen with GRAM2. With GRAM2, these
levels of scalability were only achievable when using Condor-G grid-
monitor, but GRAM5 is able to do so by itself and with less resource
use.
Observations from the 2000 job condor-g tests:
Results for GRAM2: http://dev.globus.org/wiki/GRAM5_Scalability_Results#Test_6
:_gram2-condor-g
Results for GRAM5: http://dev.globus.org/wiki/GRAM5_Scalability_Results#Test_7
:_gram5-condor-g
The service host's cpu load average was significantly less for
GRAM5 2.3 (peak 3.8) than GRAM2 29.2 (peak 35.8). The service host
memory profile was similar and reasonable for both services. Both
services processed the 2000 jobs successfully and in roughly the
same duration (GRAM5's was less).
We are encouraged by these results and we hope you are too. We
encourage community testing and feedback (positive or negative) on
this GRAM5 Alpha. For example, we'd like to know how GRAM5 behaves
for your use cases/scenarios. E.g. where there any errors? How did
it perform? What client did you use? Describe your scenarios like
we did on our scalability results page. The more detail the better.
If you are a GRAM4 user, we encourage you to try GRAM5. If there is
a particular feature you depend upon in GRAM4 that is not available
in GRAM5, please let us know.
Please send your feedback to gram-...@globus.org.
Download and install instructions for the GRAM5 Alpha is here:
http://dev.globus.org/wiki/GRAM/Scalability_Alpha_20090504
- GRAM development team