thank you Jose,

I also googled the sequence but used GGCCACGCGTCGACTAGTACA instead
of GGCCACGCGTCGACTAGTAC and google did not find any hits.
concerning sub, I don't wont to find and substitute the sequence, but I want
to identify something like the sequence above, a common sequence to most
reads, just by looking at all the reads that can not be aligned. basically
I'm looking for something like a motif-finder.

bests, jo


On Tue, Feb 9, 2010 at 2:07 PM, Muino, Jose <[email protected]> wrote:

> Hi,
>
> Perhaps you can try the "sub" function from R. Not sure if there is a
> more efficient way, but it should work.
>
> By the way, if you google the sequence (GGCCACGCGTCGACTAGTAC) you will
> find it in several papers. I have the impression that sometimes it is
> used as a primer for the generation of the first cDNA strand.
>
> Dr. Jose M Muino
> Plant Research International B.V.
> P.O. Box 619, 6700 AP Wageningen, The Netherlands
> Phone: +0317-481122.
> E-mail: [email protected]
> http://www.pri.wur.nl
>
>
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf
> > Of Johannes Rainer
> > Sent: dinsdag 9 februari 2010 13:37
> > To: [email protected]
> > Subject: [Bioc-sig-seq] identifying a common motif in a set
> > of sequences
> >
> > dear all,
> >
> > I'm wondering if there is already a function implemented in
> > any Bioconductor package that allows to identify a common
> > sequence pattern in a set of sequences.
> >
> > I'm asking this because in my ChIPseq data out of the 20 mio
> > reads only about 3 mio can be aligned to the (human) genome
> > (using bowtie), and, by looking at the sequences that can not
> > be aligned (see below), there seem to be certain sequence
> > patterns (like GGCCACGCGTCGACTAGTAC). Actually I have
> > absolutely no idea where these sequences could come from.
> > They are not adapter or primer sequences, since I've aligned
> > all adapter/primer sequences I've got from the provider
> > against these sequences.
> >
> > Is there any way to extract common sequence patterns (like
> > the GGCCACGCGTCGACTAGTAC) in an automated manner form these sequences?
> > besides that, did anybody experience the same problem?
> >
> > bests, jo
> >
> >
> >   A DNAStringSet instance of length 16196935
> >            width seq
> >        [1]    36 GGCCCCGCGTCGCCTAGTACTACATAAACAATGACC
> >        [2]    36 GGCGATGACCTTCTTGTGACCGTTGTGCATGCCGNC
> >        [3]    36 GTTTCCCAGTCACGGTCATGCTTCCTGTTTCCCAGC
> >        [4]    36 GTTTCCCAGTCACGGTCGTCCTTTTATTCTGACCTG
> >        [5]    36 GGCCACGCGTCGACTAGTACTTAAAAATATCGCACG
> >        [6]    36 GGCCACGCGTCGACTAGTACAGAAAAGACCGTGACT
> >        [7]    36 GGCCACGCGTCGACTAGTACAAAGGACATCACGCCG
> >        [8]    36 GGCCACGCGTCGACTAGTACAGAGTAAACAACGACC
> >        [9]    36 CAGTCACGGTCAAAAAATACATACTAAACACCTACT
> >        ...   ... ...
> > [16196927]    36 CAGTCACGGTCTGGCGGNATNNTTTTTGTACTAGTC
> > [16196928]    36 TAGCCAGCCAAGCCAGCNAANNCAGCCATCCAGCCA
> > [16196929]    36 GCGCCCCTGTCGCGGACNACNNGTAAGCAGCTCTCT
> > [16196930]    36 ACTACACCCCTTAGCAANGANNATCTGAGCCTCCAT
> > [16196931]    36 ACTACAAGCAAACAGTGNTCNNCTATGGTCCAGATC
> > [16196932]    36 GCAGCCACGTCCCGATCNCCNNTTTGAGTGCGTGCG
> > [16196933]    36 GGCCACGCGTCGACTAGNACNNCGAAAAATACGACC
> > [16196934]    36 GGCCACGCGTCGACTAGTACNNAAAAAACAACGCCT
> > [16196935]    36 AGTCACGGTCAAGTAACACANNAACAGAAAACCAAA
> >
> > --
> > Johannes Rainer, PhD
> > Bioinformatics Group,
> > Division Molecular Pathophysiology,
> > Biocenter, Medical University Innsbruck, Fritz-Pregl-Str
> > 3/IV, 6020 Innsbruck, Austria and Tyrolean Cancer Research
> > Institute Innrain 66, 6020 Innsbruck, Austria
> >
> > Tel.:     +43 512 570485 13
> > Email:  [email protected]
> >            [email protected]
> > URL:   http://bioinfo.i-med.ac.at
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > [email protected]
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> >
>
>


-- 
Johannes Rainer, PhD
Bioinformatics Group,
Division Molecular Pathophysiology,
Biocenter, Medical University Innsbruck,
Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria
and
Tyrolean Cancer Research Institute
Innrain 66, 6020 Innsbruck, Austria

Tel.:     +43 512 570485 13
Email:  [email protected]
           [email protected]
URL:   http://bioinfo.i-med.ac.at

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to