Hi Joseph,

Joseph Dhahbi, P.h.D. wrote:

Hi Herve
Thank you very much for your help. Using the built-in masks as you suggested was easy. Do I need to do it for each chromosome separately? Is there a way to apply it to the whole genome and create MaskedDNAString of the whole genome?

No way to create a MaskedDNAString object of the whole genome. Note that
this would be a very big object and that most machines would not have
enough memory for this. Of course, with a medium-size genome like the Fly,
the problem is not as severe as with the Human genome but still...

How about using the trick I've sent you in a previous email (see the email
for the details):

  > allrepeats <- read.XStringViews("dm3rm", format="fasta", subjectClass="DNAString", 
collapse="-")
  > c <- countPDict(pdict, subject(allrepeats))

Also, in my previous email, I was trying to reproduce the problem you had
with read.DNAStringSet() but couldn't and was asking your sessionInfo().
Did read.DNAStringSet() finally work for you?

Once I create a whole genome MaskedDNAString, I would like to use the runAnalysis1 script in the GenomeSearching.pdf to analyze my input dictionary.

Look at the runAnalysis2 script. I guess it's closer to what you are
trying to do (you have a dictionary of patterns, not a single pattern).
You'll need to make some modifications though e.g. use of countPDict
instead of matchPDict and store the results for each chromosome in a
list that you return to the caller at the end of the script. No need
to write the results to a file like in the vignette.

Cheers,
H.

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to