Hi,

I CC'ed to the list so that the community gets the answer too.


On 11/09/2012 09:35 AM, Pascale Marquis wrote:
> Bonjour Sébastien,
>
> Je travaille avec Ken Dewar et tu es déjà venu faire un stage dans notre labo.
>
> J’utilise Ray régulièrement, sur un de mes assemblages j’ai un résultat que 
> je trouve un étrange.
>
> NumberOfPairedLibraries: 1
>
> LibraryNumber: 0
>
> InputFormat: TwoFiles,Paired
>
> DetectionType: Automatic
>
> File: output_dirTALEM_t20l50.phred33.pair1.fastq
>
>    NumberOfSequences: 57104658
>
> File: output_dirTALEM_t20l50.phred33.pair2.fastq
>
>    NumberOfSequences: 57104658
>
> Distribution: TALEM.assembly/Library0.txt
>
> Peak 0
>
>    AverageOuterDistance: 306
>
>    StandardDeviation: 42
>
> Peak 1
>
>    AverageOuterDistance: 5200
>
>    StandardDeviation: 45
>
> J’ai 2 pics pour la distribution de l’insert un a 300 et l’autre a 5200.  Si 
> tu regardes le fichier en
>attaché, il y a très peu de reads qui montrent un insert size de 5000.
>
> La taille des gaps de mon insert varie de 1-5 kb, je me demande de quelle 
> façon Ray calcule la taille
>  des gaps, si la majorité des pics de l’insert est a 300bp ?


Paired information is used for contig extension and scaffolding.

In this case, the 300 will be picked up for extension and the 5000 will
simply be ignored because the density indicates that probably only the 300
will show good coherency.

For the scaffolder, the largest is selected, which in your case may be
a problem.

> Aussi, de quelle façon pourrais-je corriger

You can provide manually the information like this:

mpiexec -n 99 \
Ray -k 33 \
-o ManualTest \
-p file_1.fastq.bz2 file_2.fastq.bz2 306 42 \

That should fix your problem.

But I believe that you really have something at 5000 in your sample because
the density at 5000 in Library0.txt *will* be less than the one at 300
because the sampling is done in the seeds, not on the contigs.

RayOutput/SeedLengthDistribution.txt should tell you what your seeds look like
too.

I'll add the file Library0.txt you provided in my unit tests and change the 
code accordingly so
that the generalized algorithm works as well for your data (no false positive).


Thanks.

>ceci ?
>
> Merci et bonne journée.
>
> Pascale
>


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to