Hi Adam,
What does GLOBUS_SCRATCH_DIR point to? Local disk (i.e. /tmp)? Or some
shared file system?
We have been working on things related to your question within the
Falkon project (http://dev.globus.org/wiki/Incubator/Falkon) which
allows applications to dispatch jobs to compute nodes in a light-weight
manner. The recent extensions we have added allow compute nodes to
cache input and output data to local disk, and to reuse them across
jobs. For an overview of the work, see our recent paper at
http://people.cs.uchicago.edu/~iraicu/publications/2008_DADC08_falkon_data-diffusion.pdf.
We have implemented several cache eviction strategies (FIFO, RANDOM,
LFU, LRU), and a data-aware scheduler to make sure jobs end up on the
compute nodes that has the most data cached. Out implementation is in
Java, which doesn't have a good mechanism to get disk usage stats, so we
implemented some helper classes which really invoke tools such as "df"
and then parse the output; see
https://svn.globus.org/repos/falkon/service/org/globus/GenericPortal/common/DiskSpace2.java
for more details... the code hasn't been cleaned up, but you'll get the
idea. So far, this has worked well for us in all Linux environments
that we tried!
There are also Falkon specific mailing lists
(http://dev.globus.org/wiki/Incubator/Falkon#Mailing_Lists), if you end
up looking through Falkon code to see how various things are implemented
(cache eviction policies, space usage monitoring, data-aware scheduler,
etc), and have questions about things.
Cheers,
Ioan
Adam Bazinet wrote:
We have recently implemented a caching scheme that ends up storing job
input files in GLOBUS_SCRATCH_DIR (and these files are not
subsequently cleaned up when the job finishes, hence the term "cache"
=) This is all working well, but the part that has not been
implemented yet is some kind of cache eviction scheme. Leaving the
criteria for when files should be removed from the cache aside for the
moment, my question is this: is there a way using existing Globus
mechanisms to monitor the size of GLOBUS_SCRATCH_DIR (or to put it
another way, the amount of free space on the partition containing the
scratch directory?) And, if the partition is close to filling up, is
there a way to delete those files? Just looking for ideas &
suggestions right now, thanks!
Adam
--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================