What is the last output of Ray in the logs ? If you are using Open-MPI, you can add the option -output-filename Output321 to have a separate standard output file per processing core.
Louis Letourneau a écrit : > Ok, So I hit the walltime again and restarted my after changing the > output directory. > > It rewrote a few files and after the contig.fasta file it stoped > writting anything. > Now no file has been modified since 2 days ago and the processes are > still running. > > I went and logged in to an exec node and I see my ray processes. The cpu > is running at 100% for each process, but the cpu time is split half and > half between user and system. This doesn't seem correct, system > shouldn't be so high. > > If I strace a process I see that they are all looping on poll: > poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, > events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=22, > events=POLLIN}, {fd=23, events=POLLIN}], 7, 0) = 0 (Timeout) > > > My ray command: > Ray -read-write-checkpoints -route-messages -connection-type debruijn > -routing-graph-degree 32 -k 23 -p ... -o combined_23_2 > > this time I ran ray with 50 nodes, 10 processes per node.(I lowered the > amount of nodes since last time). > Before I hit the walltime, Ray was Scaffolging and everything seemed to > go well. > > > I've used Ray quite a few times on smaller sets, I don't understand why > I'm having so much trouble this time around :-) > > Louis > > > > > On 12-06-20 03:35 PM, Sébastien Boisvert wrote: >> When restarting Ray from checkpoints, you have to provide a different >> output directory. >> >> First job: >> >> mpiexec -n 4 Ray -o Sample_X.Ray -p file1.fastq file2.fastq \ >> -read-write-checkpoints Sample_X.Checkpoints >> >> Second job: >> >> mpiexec -n 4 Ray -o Sample_X.Ray2 -p file1.fastq file2.fastq \ >> -read-write-checkpoints Sample_X.Checkpoints >> >> >> If you are using v2.0.0-rc8, you don't have to provide >> the checkpoint directory because this option was added recently. >> >> >> >> Did you observe an improvement for the latency with and without >> message routing for your jobs ? >> >> >> >> Sébastien >> >> Louis Letourneau a écrit : >>> I guess I don't get it :-) >>> >>> I had set the option: >>> -read-write-checkpoints >>> >>> The job died, so I restarted it with the exact same parameters (simple >>> since it's in a .sh script) >>> >>> It crashed and got in the logs: >>> Error, combined_23/ already exists, change the -o parameter to another >>> value. >>> >>> >>> What setting do I give ray to resume thye assembly from checkpoints? >>> >>> Louis >>> >>> On 12-06-12 02:47 PM, Sébastien Boisvert wrote: >>>> Yes, it does that. >>>> >>>> There will be binary files with the ".ray" extension in the >>>> directory where you launched Ray. >>>> >>>> >>>> You can not change the k-mer length when starting from old checkpoints. >>>> >>>> The command needs to have the same number of arguments in the same order. >>>> >>>> >>>> On what kind of dataset are you exceeding time limits ? >>>> >>>> >>>> Louis Letourneau a écrit : >>>>> I saw these options on Ray >>>>> Checkpointing >>>>> -write-checkpoints >>>>> Write checkpoint files >>>>> -read-checkpoints >>>>> Read checkpoint files >>>>> -read-write-checkpoints >>>>> Read and write checkpoint files >>>>> >>>>> >>>>> >>>>> I'm hitting walltimes on the cluster I'm using and I'm wondering if by >>>>> setting: >>>>> -read-write-checkpoints >>>>> >>>>> I can resume where Ray got killed because of walltime? >>>>> >>>>> If that's the purpose, what a great feature! :-) >>>>> >>>>> Louis >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Denovoassembler-users mailing list >>>>> Denovoassembler-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Denovoassembler-users mailing list >>>> Denovoassembler-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Denovoassembler-users mailing list >>> Denovoassembler-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Denovoassembler-users mailing list >> Denovoassembler-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users