Forwarding here in case there are some GRAM users have not seen this yet.

Cheers,
Stu

Begin forwarded message:

From: Stuart Martin <smar...@mcs.anl.gov>
Date: May 12, 2009 9:08:07 AM CDT
To: GRAM developer <gram-...@globus.org>
Cc: Stuart Martin <smar...@mcs.anl.gov>
Subject: GRAM V5 Alpha (scalability) Release

GRAM2 (aka Pre-WS GRAM) users:

We are happy to make available a new GRAM V5 alpha quality version for testing. http://dev.globus.org/wiki/GRAM/Scalability_Alpha_20090504

GRAM5 is built from the GT4 GRAM2 code base. GRAM5 removes some features and alters some behaviors, though it remains protocol- compatible with existing GRAM2 deployments. Specifically, file streaming has been replaced by end-of-job file staging (transparently to the user), and MPICH-G2 multijob coordination is removed from the service. Preliminary compatibility testing has been successful with existing GRAM2 clients to the new GRAM5 service: globusrun, COG-jglobus, and Condor-G clients submitting and monitoring jobs. For Condor-G and GRAM5, we recommend not using the grid monitor.

This GRAM5 alpha release improves the scalability of the GRAM system by reducing the CPU and memory use. In addition to scalability- related changes, this alpha includes fixes for bugs in GSSAPI, GASS Cache, and GRAM that impact the performance and reliability of GRAM. This is the first public alpha release of GRAM5. It is not recommended for production environments, though we do request feedback and bug reports.

Significant Design Changes
---------------------------------
There are 2 significant modifications that account for the scalability improvements in GRAM5: 1) All job management and processing is done with a single Job Manager process per user (instead of one per job). This coupled with throttling the amount of work each user's job manager will do, makes it so the system load average is independent from the number of jobs in the system. This is done without sacrificing performance. 2) Monitoring of the LRM jobs is done using the scheduler event generator (SEG). The SEG has been used for a number of years in WS GRAM for scalability reasons. The SEG is more efficient than individual job querying using the LRM's CLI. And it is also more efficient than using condor-g's grid-monitor approach.

A complete list of all changes is here: http://tinyurl.com/pyhfy5

Performance and Scalability testing
-----------------------------------------
In addition to functional and compatibility testing, we ran series of performance and scalability tests with GRAM5 with our own java throughput testing client. And last, we ran a test using condor-g using the same job load/scenario to both GRAM2 and GRAM5. All test results are here: http://dev.globus.org/wiki/GRAM5_Scalability_Results

Observations from the 5-client test, http://tinyurl.com/qklpco:
In the 1 hour test period, 4944 job were submitted, with more than 4500 pending in the PBS queue at once. This demonstrates that the number of jobs being monitored by GRAM5 does not adversely affect its performance. This also shows that a naive client implementation with 5 x 50 concurrent threads does not cause the high load average on the cluster head node often seen with GRAM2. With GRAM2, these levels of scalability were only achievable when using Condor-G grid- monitor, but GRAM5 is able to do so by itself and with less resource use.

Observations from the 2000 job condor-g tests:
Results for GRAM2: http://dev.globus.org/wiki/GRAM5_Scalability_Results#Test_6 :_gram2-condor-g Results for GRAM5: http://dev.globus.org/wiki/GRAM5_Scalability_Results#Test_7 :_gram5-condor-g

The service host's cpu load average was significantly less for GRAM5 2.3 (peak 3.8) than GRAM2 29.2 (peak 35.8). The service host memory profile was similar and reasonable for both services. Both services processed the 2000 jobs successfully and in roughly the same duration (GRAM5's was less).

We are encouraged by these results and we hope you are too. We encourage community testing and feedback (positive or negative) on this GRAM5 Alpha. For example, we'd like to know how GRAM5 behaves for your use cases/scenarios. E.g. where there any errors? How did it perform? What client did you use? Describe your scenarios like we did on our scalability results page. The more detail the better. If you are a GRAM4 user, we encourage you to try GRAM5. If there is a particular feature you depend upon in GRAM4 that is not available in GRAM5, please let us know.

Please send your feedback to gram-...@globus.org.

Download and install instructions for the GRAM5 Alpha is here: 
http://dev.globus.org/wiki/GRAM/Scalability_Alpha_20090504

- GRAM development team

Reply via email to