Hi Adam,
What does GLOBUS_SCRATCH_DIR point to? Local disk (i.e. /tmp)? Or some shared file system? We have been working on things related to your question within the Falkon project (http://dev.globus.org/wiki/Incubator/Falkon) which allows applications to dispatch jobs to compute nodes in a light-weight manner. The recent extensions we have added allow compute nodes to cache input and output data to local disk, and to reuse them across jobs. For an overview of the work, see our recent paper at http://people.cs.uchicago.edu/~iraicu/publications/2008_DADC08_falkon_data-diffusion.pdf.

We have implemented several cache eviction strategies (FIFO, RANDOM, LFU, LRU), and a data-aware scheduler to make sure jobs end up on the compute nodes that has the most data cached. Out implementation is in Java, which doesn't have a good mechanism to get disk usage stats, so we implemented some helper classes which really invoke tools such as "df" and then parse the output; see https://svn.globus.org/repos/falkon/service/org/globus/GenericPortal/common/DiskSpace2.java for more details... the code hasn't been cleaned up, but you'll get the idea. So far, this has worked well for us in all Linux environments that we tried! There are also Falkon specific mailing lists (http://dev.globus.org/wiki/Incubator/Falkon#Mailing_Lists), if you end up looking through Falkon code to see how various things are implemented (cache eviction policies, space usage monitoring, data-aware scheduler, etc), and have questions about things.

Cheers,
Ioan

Adam Bazinet wrote:
We have recently implemented a caching scheme that ends up storing job input files in GLOBUS_SCRATCH_DIR (and these files are not subsequently cleaned up when the job finishes, hence the term "cache" =) This is all working well, but the part that has not been implemented yet is some kind of cache eviction scheme. Leaving the criteria for when files should be removed from the cache aside for the moment, my question is this: is there a way using existing Globus mechanisms to monitor the size of GLOBUS_SCRATCH_DIR (or to put it another way, the amount of free space on the partition containing the scratch directory?) And, if the partition is close to filling up, is there a way to delete those files? Just looking for ideas & suggestions right now, thanks!

Adam


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


Reply via email to