On Tue, Nov 4, 2008 at 11:31 AM, Romain FENOUIL <[EMAIL PROTECTED]>wrote:

> Hi all,
>
> I'm quiet new to this mailing list system so I would firstly like to
> introduce myself.
>
> My name is Romain Fenouil and I'm working as bioinformatician in the
> Immunology Center of Marseille Luminy.
> Until now, we were mainly working on assessing transcription factor binding
> sites with ChIP-on-Chip experiments and we recently tried some ChIPseq runs.
> Firstly, we wanted to assess the differences between these two techniques
> and we are now interested in going further with this method.
>
> We are working in collaboration with another lab that gives us the Eland
> aligned files and the Raw data (tag sequences) files.
> I managed (with some trouble) to load these aligned data and to have a look
> at the enrichment profiles for the TF of interest. It's amazing !
> I'm also working on using maq alignment software to make a comparison
> between the eland and maq alignments.
>
> I'm using the ShortRead R package to load the aligned data and play with
> it. And after some time to understand how it works, I have to say that it's
> really convenient.
> (Thank you to Martin Morgan and Simon Anders who helped me for eland and
> maq data loading).
>
>
> So after this big introduction, here is my question :
>
> We are now able to have enrichment profiles for all the genome except
> repeated regions.
> Since we are never satisfied, we began to be interested in some repeated
> regions (snRNAs for instance).
> We already have some ideas on how to assess binding events in case of
> repeated regions but we need some informations on these regions.
>
> Specially, I would like to know if there is a way to get the list of tags
> that were implied in multiple match in the genome.
> What we would like to have is the number of match of a tag in the genome
> and the locations of these matches.
> I heard that one can have access to different informations on these tags
> depending on which alignment software he is using.
>
> For instance i have been said that eland doesn't give you much information
> on tags that have multimatches. What about maq ?
> maybe SOAP ? Will I have to deal with bioStrings and try to remap it
> manually ?
>

MAQ, gmap, blat, and bowtie (and maybe SOAP--we do not use it) will output
multiple hits if you like.  The manuals for these alignment algorithms will
give you details.


>
> So if you have any general information or ideas about how to deal with it,
> I would be interested.
>

While I like your idea, there are, unfortunately, regions of the genome that
are not accessible to sequencing (the repeat regions).  While it is possible
to align to these regions, all you can say is that you have aligned to a
repeated region.  You are still missing localization and enrichment
information.

That said, if you have specific targets, like snoRNA for example, I would
suggest that you align to those directly rather than aligning to the
genome.  However, this will only make sense in some experimental contexts.
Many folks are of the opinion that aligning to the genome gets you all the
information in one swoop.  I do not agree with that and think that the best
bet is to tailor your target sequences to the experiment at hand.

Sean

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to