If the problem is about IOPS, will will add a ticket to fix it.
I know this code does a lot of fprintf, unbuffered in Ray.
If you use a distributed file system this can be a problem. For local
file systems I believe
the kernel will do some caching by itself.
Allen, Benjamin S a écrit :
Hi Sébastien,
In my case it was due to the shear file I/O, i.e. number of nodes
writing files all the same time. This is only an issue on our batch
cluster this isn't designed or optimized for parallel workloads. On
our larger clusters where we have Panasas parallel scratch space, we
have no issues. I just wanted to give the warning so users would be
aware of the increased file I/O these options add.
Thanks,
Ben
On Jun 12, 2012, at 1:05 PM, Sébastien Boisvert wrote:
Benjamin: Is it fairly straining in terms of input/ouput operations per
second or
because of file sizes ?
If it is about IOPS, the code can be enhanced to group I/O operations.
Allen, Benjamin S a écrit :
Louis,
That is indeed what these options do.
Ray seems to checkpoint the completion of each step, not at a
regular interval. So if you have a step in Ray thats taking longer
than your max walltime, you'll still have an issue. Also just an
FYI, depending on how big of jobs you're running (number of cores),
and what your scratch file space performance is, checkpointing can
be fairly straining on the cluster.
Ben
On Jun 12, 2012, at 8:49 AM, Louis Letourneau wrote:
I saw these options on Ray
Checkpointing
-write-checkpoints
Write checkpoint files
-read-checkpoints
Read checkpoint files
-read-write-checkpoints
Read and write checkpoint files
I'm hitting walltimes on the cluster I'm using and I'm wondering if by
setting:
-read-write-checkpoints
I can resume where Ray got killed because of walltime?
If that's the purpose, what a great feature! :-)
Louis
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
will include endpoint security, mobile security and the latest in
malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
<mailto:Denovoassembler-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
will include endpoint security, mobile security and the latest in
malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
<mailto:Denovoassembler-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users