dear all,
I'm wondering if there is already a function implemented in any Bioconductor
package that allows to identify a common sequence pattern in a set of
sequences.
I'm asking this because in my ChIPseq data out of the 20 mio reads only
about 3 mio can be aligned to the (human) genome (using bowtie), and, by
looking at the sequences that can not be aligned (see below), there seem to
be certain sequence patterns (like GGCCACGCGTCGACTAGTAC). Actually I have
absolutely no idea where these sequences could come from. They are not
adapter or primer sequences, since I've aligned all adapter/primer sequences
I've got from the provider against these sequences.
Is there any way to extract common sequence patterns (like
the GGCCACGCGTCGACTAGTAC) in an automated manner form these sequences?
besides that, did anybody experience the same problem?
bests, jo
A DNAStringSet instance of length 16196935
width seq
[1] 36 GGCCCCGCGTCGCCTAGTACTACATAAACAATGACC
[2] 36 GGCGATGACCTTCTTGTGACCGTTGTGCATGCCGNC
[3] 36 GTTTCCCAGTCACGGTCATGCTTCCTGTTTCCCAGC
[4] 36 GTTTCCCAGTCACGGTCGTCCTTTTATTCTGACCTG
[5] 36 GGCCACGCGTCGACTAGTACTTAAAAATATCGCACG
[6] 36 GGCCACGCGTCGACTAGTACAGAAAAGACCGTGACT
[7] 36 GGCCACGCGTCGACTAGTACAAAGGACATCACGCCG
[8] 36 GGCCACGCGTCGACTAGTACAGAGTAAACAACGACC
[9] 36 CAGTCACGGTCAAAAAATACATACTAAACACCTACT
... ... ...
[16196927] 36 CAGTCACGGTCTGGCGGNATNNTTTTTGTACTAGTC
[16196928] 36 TAGCCAGCCAAGCCAGCNAANNCAGCCATCCAGCCA
[16196929] 36 GCGCCCCTGTCGCGGACNACNNGTAAGCAGCTCTCT
[16196930] 36 ACTACACCCCTTAGCAANGANNATCTGAGCCTCCAT
[16196931] 36 ACTACAAGCAAACAGTGNTCNNCTATGGTCCAGATC
[16196932] 36 GCAGCCACGTCCCGATCNCCNNTTTGAGTGCGTGCG
[16196933] 36 GGCCACGCGTCGACTAGNACNNCGAAAAATACGACC
[16196934] 36 GGCCACGCGTCGACTAGTACNNAAAAAACAACGCCT
[16196935] 36 AGTCACGGTCAAGTAACACANNAACAGAAAACCAAA
--
Johannes Rainer, PhD
Bioinformatics Group,
Division Molecular Pathophysiology,
Biocenter, Medical University Innsbruck,
Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria
and
Tyrolean Cancer Research Institute
Innrain 66, 6020 Innsbruck, Austria
Tel.: +43 512 570485 13
Email: [email protected]
[email protected]
URL: http://bioinfo.i-med.ac.at
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing