On Mon, Jan 25, 2010 at 10:25 AM, Droit Arnaud <[email protected]>wrote:

> Hi Patrick,
>
> Thanks for your answer.
>
> We are developing an motif discovery and analysis pipeline for ChIP-Seq
> experiment.
> We're using BSgenome to convert BED file into fasta sequence with getseq
> function.
>
> We would like to get masked sequences to improve the motif analysis by
> eliminating repeats or other low interest regions.
>
> So, is there a way to get masked region instead of the original sequence
> containing in BSgenome ?
> Not only "show" the sequence, but convert the masked sequence into a
> string.
>
> Because activated the masks chromosome only allow me to visualize the
> masked sequence of the BSgenome object.
> But I'm still not able to access to the masked sequence.
>
>
You should be able to bsapply() over the BSgenome object, get out the repeat
mask, then concatenate it into a RangesList. Then call getSeq() with that.

Like:
getSeq(Hsapiens, as(bsapply(Hsapiens, function(x) masks(x)$RM),
"RangesList"))
or something.

Could always just pull in the repeat masker track from UCSC too.

Thanks,
>
> Arnaud.
>
> On 10-01-19 7:38 PM, "Patrick Aboyoun" <[email protected]> wrote:
>
> Arnaud,
> The BSgenome object, in this case Hsapiens, contains references to on
> disk storage of information such as masks. Since this information is not
> in memory and the data stored on disk is considered read-only, you
> cannot change the mask information on a BSgenome object. Instead, you
> need to modify the masks chromosome by chromosome after they have been
> loaded into memory as you showed in your code below.
>
> What is your use case that motivated your e-mail?
>
> If you never want to deal with masks, you can always use the unmasked
> function to strip the masks when you load the chromosome:
>
>  > unmasked(Hsapiens$chr1)
>  247249719-letter "DNAString" instance
> seq:
> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>
>
>
> Patrick
>
>
>
> Droit Arnaud wrote:
> > Hi,
> >
> > I wondering if anybody can help me to generate masked (by RepeatMasker
> for instance) sequences.
> >
> > I'm currently using Bsgenome to extract sequence from a BED file such as
> :
> >
> > library(BSgenome.Hsapiens.UCSC.hg18)
> > genome<-Hsapiens
> > FastaSeq<-getSeq(genome,"chr1",start=1000,end=1200, as.character=FALSE)
> >
> > I know that Bsgenome contains masks that can be apply by using :
> >
> > chr1 <- genome$chr1
> > active(masks(chr1)) <- TRUE
> >
> > So, I'm trying to use it to change the masks of the genome object. But I
> cannot modify it :
> >
> > active(masks(genome$chr1)) <- TRUE
> >  Error in `$<-`(`*tmp*`, "chr1", value = <S4 object of class
> "MaskedDNAString">) :
> >  no method for assigning subsets of this S4 class
> >
> > Is there a way get the masked sequence with the getSeq function ?
> >
> > Thanks.
> >
> > Arnaud.
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > [email protected]
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to