On 02/24/2013 06:01 AM, James Vincent wrote:
> Sorry, yes, I started another job to test again, just to be sure. This
> is the large job (many read). It is still running and still has lines
> like this at the tail of stdout/stderr. That log file is now 3GB in
> size and all CPUs are at 100% .
>
> Rank 30: assembler memory usage: 19553904 KiB
> Rank 28 reached 373957000 vertices from seed 12983, flow 1
That's a lot of vertices. Can you add the option
-disable-recycling
It's the second time (in 2 years) that someone gets this behavior
with read recycling.
I will look into that, I added a ticket here:
https://github.com/sebhtml/ray/issues/161
> Speed RAY_SLAVE_MODE_EXTENSION 10089 units/second
> Rank 28: assembler memory usage: 18725528 KiB
> Rank 9 reached 162358000 vertices from seed 20147, flow 1
> Speed RAY_SLAVE_MODE_EXTENSION 12138 units/second
> Rank 9: assembler memory usage: 16390292 KiB
> Rank 0 reached 228409000 vertices from seed 18, flow 1
> Speed RAY_SLAVE_MODE_EXTENSION 5988 units/second
> Rank 0: assembler memory usage: 24002868 KiB
> Rank 31 reached 358009000 vertices from seed 2286, flow 2
> Speed RAY_SLAVE_MODE_EXTENSION 9487 units/second
> Rank 31: assembler memory usage: 17019092 KiB
> Rank 17 reached 741182000 vertices from seed 1241, flow 2
> Speed RAY_SLAVE_MODE_EXTENSION 20065 units/second
> Rank 17: assembler memory usage: 24871160 KiB
>
>
>
> On Sat, Feb 23, 2013 at 5:36 PM, Sébastien Boisvert
> <[email protected]> wrote:
>> [Please CC the mailing list]
>>
>>
>>
>> On 02/23/2013 05:26 PM, James Vincent wrote:
>>>
>>> I redirect stdout and sterr together in one file. It's huge.
>>>
>>> The last Step that was logged is this:
>>>
>>> ***
>>> Step: Estimation of outer distances for paired reads
>>> Date: Sat Feb 23 16:49:02 2013
>>> Elapsed time: 42 seconds
>>> Since beginning: 3 hours, 38 minutes, 41 seconds
>>> ***
>>>
>>> That is about halfway through the log. Its followed by many lines like
>>> this:
>>>
>>> Current peak coverage -> 178
>>> Rank 0 reached 0 vertices from seed 0, flow 1
>>> Rank 0: assembler memory usage: 2318560 KiB
>>> Rank 15 reached 1000 vertices from seed 0, flow 1
>>> Speed RAY_SLAVE_MODE_EXTENSION 5503 units/second
>>> Rank 15: assembler memory usage: 2314452 KiB
>>> Rank 33 reached 1000 vertices from seed 0, flow 1
>>> Speed RAY_SLAVE_MODE_EXTENSION 5107 units/second
>>> Rank 33: assembler memory usage: 2322828 KiB
>>> Rank 16 reached 1000 vertices from seed 0, flow 1
>>> Speed RAY_SLAVE_MODE_EXTENSION 4963 units/second
>>> Rank 16: assembler memory usage: 2318688 KiB
>>> Rank 15 reached 1163 vertices from seed 0, flow 1
>>> Speed RAY_SLAVE_MODE_EXTENSION 7050 units/second
>>> Rank 15: assembler memory usage: 2314452 KiB
>>>
>>
>> In your other email, you said that the last lines were:
>>
>>
>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 17092 units/second
>> Estimated remaining time for this step: 55 minutes, 57 seconds
>>
>> Is your job still running ?
>>
>>
>>>
>>> fgrep -c RAY_SLAVE_MODE_EXTENSION out.Sample_BS.sickle
>>> 1394932
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 23, 2013 at 3:41 PM, Sébastien Boisvert
>>> <[email protected]> wrote:
>>>>
>>>> On 02/23/2013 01:02 PM, James Vincent wrote:
>>>>>
>>>>>
>>>>> UNfortunately a bigger job has also failed with no apparent warnings.
>>>>> stdout has nothing beginning with 'error'. Is there something else to
>>>>> look at?
>>>>>
>>>>
>>>> Did you look in the standard error ?
>>>>
>>>>
>>>>>
>>>>> The last few lines of stdout are:
>>>>>
>>>>> Rank 10 is purging edges [3450001/61093950]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 14509 units/second
>>>>> Estimated remaining time for this step: 1 hours, 6 minutes, 12 seconds
>>>>> Rank 3 is purging edges [3200001/61096936]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 13563 units/second
>>>>> Estimated remaining time for this step: 1 hours, 11 minutes, 8 seconds
>>>>> Rank 5 is purging edges [3350001/61077916]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 14354 units/second
>>>>> Estimated remaining time for this step: 1 hours, 7 minutes, 1 seconds
>>>>> Rank 7 is purging edges [3250001/61120658]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 13945 units/second
>>>>> Estimated remaining time for this step: 1 hours, 9 minutes, 9 seconds
>>>>> Rank 31 is purging edges [3400001/61067514]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 14516 units/second
>>>>> Estimated remaining time for this step: 1 hours, 6 minutes, 12 seconds
>>>>> Rank 9 is purging edges [3500001/61066398]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 14833 units/second
>>>>> Estimated remaining time for this step: 1 hours, 4 minutes, 40 seconds
>>>>> Rank 4 is purging edges [3700001/61082686]
>>>>> Speed RAY_SLAVE_MODE_PURGE_NULL_EDGES 17092 units/second
>>>>> Estimated remaining time for this step: 55 minutes, 57 seconds
>>>>>
>>>>> And the last few entries in ElapsedTime are:
>>>>>
>>>>>
>>>>> ***
>>>>> Step: Coverage distribution analysis
>>>>> Date: Fri Feb 22 19:34:44 2013
>>>>> Elapsed time: 18 seconds
>>>>> Since beginning: 1 hours, 3 minutes, 36 seconds
>>>>> ***
>>>>>
>>>>>
>>>>> ***
>>>>> Step: Graph construction
>>>>> Date: Fri Feb 22 20:31:57 2013
>>>>> Elapsed time: 57 minutes, 13 seconds
>>>>> Since beginning: 2 hours, 49 seconds
>>>>> ***
>>>>>
>>>>> This job was run a little differently. I did not use any profiliing
>>>>> but I did set a minimum contig length:
>>>>>
>>>>> mpiexec --mca btl ^sm -n 40 $RAY -o $OUTDIR -p $LEFT $RIGHT -k 31 \
>>>>> -minimum-contig-length $MINCONTIG >&out.$OUTDIR
>>>>>
>>>>> The input files have 78426887 paired reads. These were quality trimmed
>>>>> with sickle.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks,
>>>>> Jim
>>>>>
>>>>>
>>>>> On Fri, Feb 22, 2013 at 5:00 PM, Sébastien Boisvert
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> On 02/22/2013 03:46 PM, James Vincent wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> That did it - thanks very much. The job completed. It's still a small
>>>>>>> job, but I'll test with full size soon.
>>>>>>>
>>>>>>
>>>>>> Excellent !
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> This job completed quickly. The previous run must have been hanging on
>>>>>>> something because I showed 100% CPU on all 40 cores for many, many
>>>>>>> hours with the same input. After making the change you suggest below
>>>>>>> the job finishes in 1 hour.
>>>>>>>
>>>>>>
>>>>>> With the message passing interface, Ray processes probes actively for
>>>>>> new
>>>>>> messages, it's not
>>>>>> event-driven.
>>>>>>
>>>>>>
>>>>>>> For 100K paired reads on 40 cores, does one hour sound roughly in the
>>>>>>> ball
>>>>>>> park?
>>>>>>>
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>> Most of the time was probably in the graph coloring though.
>>>>>>
>>>>>>
>>>>>>> On Fri, Feb 22, 2013 at 9:34 AM, Sébastien Boisvert
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you try adding this to Open-MPI options:
>>>>>>>>
>>>>>>>> --mca btl ^sm
>>>>>>>>
>>>>>>>>
>>>>>>>> Full command:
>>>>>>>>
>>>>>>>> mpiexec --mca btl ^sm -n 40 $RAY -o $OUTDIR -p $LEFT $RIGHT -k 31 \
>>>>>>>> -search $NCBIDIR/NCBI-Finished-Bacterial-Genomes \
>>>>>>>> -with-taxonomy $NCBIDIR/Genome-to-Taxon.tsv
>>>>>>>> $NCBIDIR/TreeOfLife-Edges.tsv $NCBIDIR/Taxon-Names.tsv
>>>>>>>>
>>>>>>>>
>>>>>>>> This Open-MPI option will disable Open-MPI's "sm" byte transfer
>>>>>>>> layer.
>>>>>>>> "sm" means shared memory.
>>>>>>>>
>>>>>>>> All messages will go through "tcp" (using the loopback since it's all
>>>>>>>> on
>>>>>>>> the same machine),
>>>>>>>> or through "self".
>>>>>>>>
>>>>>>>>
>>>>>>>> p.s.: Open-MPI 1.4.3 is really old (2010-10-05). The last release is
>>>>>>>> Open-MPI 1.6.4.
>>>>>>>>
>>>>>>>> On 02/22/2013 08:06 AM, James Vincent wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is OpenMPI 1.4.3. Here is the blurb at the start of a run:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> MAXKMERLENGTH: 32
>>>>>>>>> KMER_U64_ARRAY_SIZE: 1
>>>>>>>>> Maximum coverage depth stored by CoverageDepth: 4294967295
>>>>>>>>> MAXIMUM_MESSAGE_SIZE_IN_BYTES: 4000 bytes
>>>>>>>>> FORCE_PACKING = n
>>>>>>>>> ASSERT = n
>>>>>>>>> HAVE_LIBZ = n
>>>>>>>>> HAVE_LIBBZ2 = n
>>>>>>>>> CONFIG_PROFILER_COLLECT = n
>>>>>>>>> CONFIG_CLOCK_GETTIME = n
>>>>>>>>> __linux__ = y
>>>>>>>>> _MSC_VER = n
>>>>>>>>> __GNUC__ = y
>>>>>>>>> RAY_32_BITS = n
>>>>>>>>> RAY_64_BITS = y
>>>>>>>>> MPI standard version: MPI 2.1
>>>>>>>>> MPI library: Open-MPI 1.4.3
>>>>>>>>> Compiler: GNU gcc/g++ 4.4.6 20120305 (Red Hat 4.4.6-4)
>>>>>>>>>
>>>>>>>>> Rank 0: Operating System: Linux (__linux__) POSIX (OS_POSIX)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 22, 2013 at 8:01 AM, Sébastien Boisvert
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Which MPI library are you using ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 02/22/2013 07:34 AM, James Vincent wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here are the last 50 lines :
>>>>>>>>>>>
>>>>>>>>>>> -bash-4.1$ tail -50 out.run
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_incrementReferences operations: 32762
>>>>>>>>>>> OPERATION_decrementReferences operations: 32386
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_purgeVirtualColor operations: 1567
>>>>>>>>>>> **********************************************************
>>>>>>>>>>>
>>>>>>>>>>> Rank 25: assembler memory usage: 138560 KiB
>>>>>>>>>>> Rank 31 biological abundances 3210 [1/1] [1752/2254] [4/7]
>>>>>>>>>>> Rank 31 RAY_SLAVE_MODE_ADD_COLORS processed files: 43/56
>>>>>>>>>>> Rank 31 RAY_SLAVE_MODE_ADD_COLORS processed sequences in file: 3/7
>>>>>>>>>>> Rank 31 RAY_SLAVE_MODE_ADD_COLORS total processed sequences:
>>>>>>>>>>> 81/107
>>>>>>>>>>> Rank 31 RAY_SLAVE_MODE_ADD_COLORS processed k-mers for current
>>>>>>>>>>> sequence: 2341425/0
>>>>>>>>>>> Rank 31 RAY_SLAVE_MODE_ADD_COLORS total processed k-mers:
>>>>>>>>>>> 149500000
>>>>>>>>>>> Speed RAY_SLAVE_MODE_ADD_COLORS 20803 units/second
>>>>>>>>>>>
>>>>>>>>>>> **********************************************************
>>>>>>>>>>> Coloring summary
>>>>>>>>>>> Number of virtual colors: 232
>>>>>>>>>>> Number of real colors: 3444
>>>>>>>>>>>
>>>>>>>>>>> Keys in index: 230
>>>>>>>>>>> Observed collisions when populating the index: 0
>>>>>>>>>>> COLOR_NAMESPACE_MULTIPLIER= 10000000000000000
>>>>>>>>>>>
>>>>>>>>>>> Operations
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_getVirtualColorFrom operations: 35380
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_IN_PLACE_ONE_REFERENCE: 31951
>>>>>>>>>>> OPERATION_NO_VIRTUAL_COLOR_HAS_HASH_CREATION operations:
>>>>>>>>>>> 1730
>>>>>>>>>>> OPERATION_VIRTUAL_COLOR_HAS_COLORS_FETCH operations: 1506
>>>>>>>>>>> OPERATION_NO_VIRTUAL_COLOR_HAS_COLORS_CREATION operations:
>>>>>>>>>>> 193
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_createVirtualColorFrom operations: 1923
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_allocateVirtualColorHandle operations: 1923
>>>>>>>>>>> OPERATION_NEW_FROM_EMPTY operations: 1692
>>>>>>>>>>> OPERATION_NEW_FROM_SCRATCH operations: 231
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_applyHashOperation operations: 37303
>>>>>>>>>>> OPERATION_getHash operations: 0
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_incrementReferences operations: 35380
>>>>>>>>>>> OPERATION_decrementReferences operations: 34985
>>>>>>>>>>>
>>>>>>>>>>> OPERATION_purgeVirtualColor operations: 1693
>>>>>>>>>>> **********************************************************
>>>>>>>>>>>
>>>>>>>>>>> Rank 31: assembler memory usage: 139452 KiB
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 22, 2013 at 7:13 AM, Sébastien Boisvert
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What's the last thing reported in stdout ?
>>>>>>>>>>>>
>>>>>>>>>>>> On 02/22/2013 06:23 AM, jjv5 wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am running ray meta on a shared memory machine with 40 cores
>>>>>>>>>>>>> and
>>>>>>>>>>>>> 1TB
>>>>>>>>>>>>> memory. One very small job with 25K reads finished and gave
>>>>>>>>>>>>> various
>>>>>>>>>>>>> taxonomic outputs. Jobs with slightly
>>>>>>>>>>>>> more input seem to never finish. The output log just stops but
>>>>>>>>>>>>> there
>>>>>>>>>>>>> are no errors indicated. Where might one start to look to
>>>>>>>>>>>>> determine
>>>>>>>>>>>>> where a job has gone wrong?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The command I use is below. There are 100,000 paired reads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpiexec -n 40 $RAY -o $OUTDIR -p $LEFT $RIGHT -k 31 \
>>>>>>>>>>>>> -search $NCBIDIR/NCBI-Finished-Bacterial-Genomes \
>>>>>>>>>>>>> -with-taxonomy $NCBIDIR/Genome-to-Taxon.tsv
>>>>>>>>>>>>> $NCBIDIR/TreeOfLife-Edges.tsv $NCBIDIR/Taxon-Names.tsv
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>>> Everyone hates slow websites. So do we.
>>>>>>>>>>>>> Make your web apps faster with AppDynamics
>>>>>>>>>>>>> Download AppDynamics Lite for free today:
>>>>>>>>>>>>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>> Everyone hates slow websites. So do we.
>>>>>>>>>>>> Make your web apps faster with AppDynamics
>>>>>>>>>>>> Download AppDynamics Lite for free today:
>>>>>>>>>>>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> Everyone hates slow websites. So do we.
>>>>>>>>>> Make your web apps faster with AppDynamics
>>>>>>>>>> Download AppDynamics Lite for free today:
>>>>>>>>>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Everyone hates slow websites. So do we.
>>>>>>>> Make your web apps faster with AppDynamics
>>>>>>>> Download AppDynamics Lite for free today:
>>>>>>>> http://p.sf.net/sfu/appdyn_d2d_feb
>>>>>>>> _______________________________________________
>>>>>>>> Denovoassembler-users mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users