Hi all,
As Daniel said, changing color space in a base space could be a solution
for some applications.
But we have to be careful with this trick, especially for the mapping.
For exemple, in the color space, 2 adjacent mismatches correspond to
only one base mismatche. That's why we have to take into account the
color space proprieties.
Best,
Nicolas
Daniel Klevebring a écrit :
Hi all,
In the SOLiD system, read are mapped a color-space encoded version of
the reference sequence in question, after which start and end
coordinates are reported. Base-space sequence can then (if needed) be
extracted from the reference sequence using the given coordinates.
There is a "double encoding"-system, where the colors (0, 1, 2, 3) are
changed to letters (A, C, G, T) to trick certain software to work with
SOLiD data. This does not correspond to the actual base-space
sequence, it's only a representation of the color-space sequence. I
guess it would be possible to use this trick to make BioStrings and
ShortRead work with SOLiD data.
However, one very important feature of the SOLiD system, is that the
reverse complement sequence corresponds to the reverse color-space
sequence (there is no "complement" in color-space). This means that
the algorithm for returning the rev-comp sequence when prior to
matching on the (-)-strand need to be re-written to report the reverse
sequence instead of the rev-comp.
Did all this make sense...? Basically, I think it would be possible to
make it work if the colors are "double-encoded" and the internal
function that rev-comps a sequence is modified to report the reverse.
Best
Daniel Klevebring
On 5 feb 2009, at 01.39, Martin Morgan wrote:
, once reads are represented
as traditional nucleotide sequences (which I guess they must be at
some point?).
--
Contact information:
Daniel Klevebring
M. Sc. Eng., Ph.D. Student
Dept of Gene Technology
Royal Institute of Technology, KTH
SE-106 91 Stockholm, Sweden
Visiting address: Roslagstullsbacken 21, B3
Delivery address: Roslagsvägen 30B, 104 06, Stockholm
Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075,
SE-10450 Stockholm
E-mail: [email protected] <mailto:[email protected]>
E-mail: [email protected] <mailto:[email protected]>
Phone: +46 8 5537 8337 (Office)
Phone: +46 704 71 65 91 (Mobile)
Web: http://www.biotech.kth.se/genetech/index.html
Web: http://www.arrayadvice.se/
Fax: +46 8 5537 8481
MSN messenger: [email protected] <mailto:[email protected]>
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: [email protected]
Tel: 01 56 24 69 85
http://bioinfo.curie.fr/
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing