thank you Jose, I also googled the sequence but used GGCCACGCGTCGACTAGTACA instead of GGCCACGCGTCGACTAGTAC and google did not find any hits. concerning sub, I don't wont to find and substitute the sequence, but I want to identify something like the sequence above, a common sequence to most reads, just by looking at all the reads that can not be aligned. basically I'm looking for something like a motif-finder.
bests, jo On Tue, Feb 9, 2010 at 2:07 PM, Muino, Jose <[email protected]> wrote: > Hi, > > Perhaps you can try the "sub" function from R. Not sure if there is a > more efficient way, but it should work. > > By the way, if you google the sequence (GGCCACGCGTCGACTAGTAC) you will > find it in several papers. I have the impression that sometimes it is > used as a primer for the generation of the first cDNA strand. > > Dr. Jose M Muino > Plant Research International B.V. > P.O. Box 619, 6700 AP Wageningen, The Netherlands > Phone: +0317-481122. > E-mail: [email protected] > http://www.pri.wur.nl > > > > -----Original Message----- > > From: [email protected] > > [mailto:[email protected]] On Behalf > > Of Johannes Rainer > > Sent: dinsdag 9 februari 2010 13:37 > > To: [email protected] > > Subject: [Bioc-sig-seq] identifying a common motif in a set > > of sequences > > > > dear all, > > > > I'm wondering if there is already a function implemented in > > any Bioconductor package that allows to identify a common > > sequence pattern in a set of sequences. > > > > I'm asking this because in my ChIPseq data out of the 20 mio > > reads only about 3 mio can be aligned to the (human) genome > > (using bowtie), and, by looking at the sequences that can not > > be aligned (see below), there seem to be certain sequence > > patterns (like GGCCACGCGTCGACTAGTAC). Actually I have > > absolutely no idea where these sequences could come from. > > They are not adapter or primer sequences, since I've aligned > > all adapter/primer sequences I've got from the provider > > against these sequences. > > > > Is there any way to extract common sequence patterns (like > > the GGCCACGCGTCGACTAGTAC) in an automated manner form these sequences? > > besides that, did anybody experience the same problem? > > > > bests, jo > > > > > > A DNAStringSet instance of length 16196935 > > width seq > > [1] 36 GGCCCCGCGTCGCCTAGTACTACATAAACAATGACC > > [2] 36 GGCGATGACCTTCTTGTGACCGTTGTGCATGCCGNC > > [3] 36 GTTTCCCAGTCACGGTCATGCTTCCTGTTTCCCAGC > > [4] 36 GTTTCCCAGTCACGGTCGTCCTTTTATTCTGACCTG > > [5] 36 GGCCACGCGTCGACTAGTACTTAAAAATATCGCACG > > [6] 36 GGCCACGCGTCGACTAGTACAGAAAAGACCGTGACT > > [7] 36 GGCCACGCGTCGACTAGTACAAAGGACATCACGCCG > > [8] 36 GGCCACGCGTCGACTAGTACAGAGTAAACAACGACC > > [9] 36 CAGTCACGGTCAAAAAATACATACTAAACACCTACT > > ... ... ... > > [16196927] 36 CAGTCACGGTCTGGCGGNATNNTTTTTGTACTAGTC > > [16196928] 36 TAGCCAGCCAAGCCAGCNAANNCAGCCATCCAGCCA > > [16196929] 36 GCGCCCCTGTCGCGGACNACNNGTAAGCAGCTCTCT > > [16196930] 36 ACTACACCCCTTAGCAANGANNATCTGAGCCTCCAT > > [16196931] 36 ACTACAAGCAAACAGTGNTCNNCTATGGTCCAGATC > > [16196932] 36 GCAGCCACGTCCCGATCNCCNNTTTGAGTGCGTGCG > > [16196933] 36 GGCCACGCGTCGACTAGNACNNCGAAAAATACGACC > > [16196934] 36 GGCCACGCGTCGACTAGTACNNAAAAAACAACGCCT > > [16196935] 36 AGTCACGGTCAAGTAACACANNAACAGAAAACCAAA > > > > -- > > Johannes Rainer, PhD > > Bioinformatics Group, > > Division Molecular Pathophysiology, > > Biocenter, Medical University Innsbruck, Fritz-Pregl-Str > > 3/IV, 6020 Innsbruck, Austria and Tyrolean Cancer Research > > Institute Innrain 66, 6020 Innsbruck, Austria > > > > Tel.: +43 512 570485 13 > > Email: [email protected] > > [email protected] > > URL: http://bioinfo.i-med.ac.at > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > [email protected] > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > > > -- Johannes Rainer, PhD Bioinformatics Group, Division Molecular Pathophysiology, Biocenter, Medical University Innsbruck, Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria and Tyrolean Cancer Research Institute Innrain 66, 6020 Innsbruck, Austria Tel.: +43 512 570485 13 Email: [email protected] [email protected] URL: http://bioinfo.i-med.ac.at [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
