Hi, I CC'ed to the list so that the community gets the answer too.
On 11/09/2012 09:35 AM, Pascale Marquis wrote: > Bonjour Sébastien, > > Je travaille avec Ken Dewar et tu es déjà venu faire un stage dans notre labo. > > J’utilise Ray régulièrement, sur un de mes assemblages j’ai un résultat que > je trouve un étrange. > > NumberOfPairedLibraries: 1 > > LibraryNumber: 0 > > InputFormat: TwoFiles,Paired > > DetectionType: Automatic > > File: output_dirTALEM_t20l50.phred33.pair1.fastq > > NumberOfSequences: 57104658 > > File: output_dirTALEM_t20l50.phred33.pair2.fastq > > NumberOfSequences: 57104658 > > Distribution: TALEM.assembly/Library0.txt > > Peak 0 > > AverageOuterDistance: 306 > > StandardDeviation: 42 > > Peak 1 > > AverageOuterDistance: 5200 > > StandardDeviation: 45 > > J’ai 2 pics pour la distribution de l’insert un a 300 et l’autre a 5200. Si > tu regardes le fichier en >attaché, il y a très peu de reads qui montrent un insert size de 5000. > > La taille des gaps de mon insert varie de 1-5 kb, je me demande de quelle > façon Ray calcule la taille > des gaps, si la majorité des pics de l’insert est a 300bp ? Paired information is used for contig extension and scaffolding. In this case, the 300 will be picked up for extension and the 5000 will simply be ignored because the density indicates that probably only the 300 will show good coherency. For the scaffolder, the largest is selected, which in your case may be a problem. > Aussi, de quelle façon pourrais-je corriger You can provide manually the information like this: mpiexec -n 99 \ Ray -k 33 \ -o ManualTest \ -p file_1.fastq.bz2 file_2.fastq.bz2 306 42 \ That should fix your problem. But I believe that you really have something at 5000 in your sample because the density at 5000 in Library0.txt *will* be less than the one at 300 because the sampling is done in the seeds, not on the contigs. RayOutput/SeedLengthDistribution.txt should tell you what your seeds look like too. I'll add the file Library0.txt you provided in my unit tests and change the code accordingly so that the generalized algorithm works as well for your data (no false positive). Thanks. >ceci ? > > Merci et bonne journée. > > Pascale > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users