On Mon, Jan 25, 2010 at 10:25 AM, Droit Arnaud <[email protected]>wrote:
> Hi Patrick, > > Thanks for your answer. > > We are developing an motif discovery and analysis pipeline for ChIP-Seq > experiment. > We're using BSgenome to convert BED file into fasta sequence with getseq > function. > > We would like to get masked sequences to improve the motif analysis by > eliminating repeats or other low interest regions. > > So, is there a way to get masked region instead of the original sequence > containing in BSgenome ? > Not only "show" the sequence, but convert the masked sequence into a > string. > > Because activated the masks chromosome only allow me to visualize the > masked sequence of the BSgenome object. > But I'm still not able to access to the masked sequence. > > You should be able to bsapply() over the BSgenome object, get out the repeat mask, then concatenate it into a RangesList. Then call getSeq() with that. Like: getSeq(Hsapiens, as(bsapply(Hsapiens, function(x) masks(x)$RM), "RangesList")) or something. Could always just pull in the repeat masker track from UCSC too. Thanks, > > Arnaud. > > On 10-01-19 7:38 PM, "Patrick Aboyoun" <[email protected]> wrote: > > Arnaud, > The BSgenome object, in this case Hsapiens, contains references to on > disk storage of information such as masks. Since this information is not > in memory and the data stored on disk is considered read-only, you > cannot change the mask information on a BSgenome object. Instead, you > need to modify the masks chromosome by chromosome after they have been > loaded into memory as you showed in your code below. > > What is your use case that motivated your e-mail? > > If you never want to deal with masks, you can always use the unmasked > function to strip the masks when you load the chromosome: > > > unmasked(Hsapiens$chr1) > 247249719-letter "DNAString" instance > seq: > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCC...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN > > > > Patrick > > > > Droit Arnaud wrote: > > Hi, > > > > I wondering if anybody can help me to generate masked (by RepeatMasker > for instance) sequences. > > > > I'm currently using Bsgenome to extract sequence from a BED file such as > : > > > > library(BSgenome.Hsapiens.UCSC.hg18) > > genome<-Hsapiens > > FastaSeq<-getSeq(genome,"chr1",start=1000,end=1200, as.character=FALSE) > > > > I know that Bsgenome contains masks that can be apply by using : > > > > chr1 <- genome$chr1 > > active(masks(chr1)) <- TRUE > > > > So, I'm trying to use it to change the masks of the genome object. But I > cannot modify it : > > > > active(masks(genome$chr1)) <- TRUE > > Error in `$<-`(`*tmp*`, "chr1", value = <S4 object of class > "MaskedDNAString">) : > > no method for assigning subsets of this S4 class > > > > Is there a way get the masked sequence with the getSeq function ? > > > > Thanks. > > > > Arnaud. > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > [email protected] > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
