Gotcha. Any optimizations would be great, however I don't think its a high priority.
I don't know at this point how much optimizations for better IOPS would help, as I haven't spent much time looking at the FS while Ray is read/writing check points. Ben On Jun 12, 2012, at 1:27 PM, Sébastien Boisvert wrote: > If the problem is about IOPS, will will add a ticket to fix it. > > I know this code does a lot of fprintf, unbuffered in Ray. > > If you use a distributed file system this can be a problem. For local file > systems I believe > the kernel will do some caching by itself. > > > Allen, Benjamin S a écrit : >> Hi Sébastien, >> >> In my case it was due to the shear file I/O, i.e. number of nodes writing >> files all the same time. This is only an issue on our batch cluster this >> isn't designed or optimized for parallel workloads. On our larger clusters >> where we have Panasas parallel scratch space, we have no issues. I just >> wanted to give the warning so users would be aware of the increased file I/O >> these options add. >> >> Thanks, >> >> Ben >> >> On Jun 12, 2012, at 1:05 PM, Sébastien Boisvert wrote: >> >>> Benjamin: Is it fairly straining in terms of input/ouput operations per >>> second or >>> because of file sizes ? >>> >>> >>> If it is about IOPS, the code can be enhanced to group I/O operations. >>> >>> >>> >>> Allen, Benjamin S a écrit : >>>> Louis, >>>> >>>> That is indeed what these options do. >>>> >>>> Ray seems to checkpoint the completion of each step, not at a regular >>>> interval. So if you have a step in Ray thats taking longer than your max >>>> walltime, you'll still have an issue. Also just an FYI, depending on how >>>> big of jobs you're running (number of cores), and what your scratch file >>>> space performance is, checkpointing can be fairly straining on the cluster. >>>> >>>> Ben >>>> >>>> On Jun 12, 2012, at 8:49 AM, Louis Letourneau wrote: >>>> >>>>> I saw these options on Ray >>>>> Checkpointing >>>>> -write-checkpoints >>>>> Write checkpoint files >>>>> -read-checkpoints >>>>> Read checkpoint files >>>>> -read-write-checkpoints >>>>> Read and write checkpoint files >>>>> >>>>> >>>>> >>>>> I'm hitting walltimes on the cluster I'm using and I'm wondering if by >>>>> setting: >>>>> -read-write-checkpoints >>>>> >>>>> I can resume where Ray got killed because of walltime? >>>>> >>>>> If that's the purpose, what a great feature! :-) >>>>> >>>>> Louis >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Denovoassembler-users mailing list >>>>> Denovoassembler-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Denovoassembler-users mailing list >>> Denovoassembler-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> >
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users