quoting Joseph "Thank you for the fast response. Can you direct me to the instructions on how to compile a 64-bit version of R for the Mac? I've never done any compiling before."
Hi Joseph the advice here about debugging and trying linux is good but if you don't know how to compile R then I assume you have installed the 32-bit Mac binaries from CRAN? Hence your immediate problem is you are limited to addressing only ca 3-4 Gb of the RAM you have installed. Probably a bit short of enough for Illumina pDicts. If you do want to build a 64-bit version from source-- you will need to install some extra tools, and I think you will also have to build some R packages from source. You can read some instructions starting here: http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#What-machines-does-R-for-Mac-OS-X-run-on_003f Alternatively you could try the 64 bit binaries of R 2.8 (devel) from here http://r.research.att.com/ which I'm guessing should work OK with Biostrings???? Stephen ________________________________ From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED] Sent: Tue 03/06/2008 19:36 To: Joseph Dhahbi, P.h.D. Cc: [email protected] Subject: Re: [Bioc-sig-seq] PDict question Hi Joseph, You could run PDict() in debug mode by calling: > Biostrings:::debug_ACtree_utils() first and then try to run your example again and you would see something like this: > NM_seq_pDict=PDict(NM_seq_clean) [DEBUG] alloc_actree_nodes_buf(): length=4817537 width=36 maxnodes=126030830 [DEBUG] alloc_actree_nodes_buf(): allocating actree_nodes_buf (bufsize=4032986560) ... OK [DEBUG] CWdna_free_actree_nodes_buf(): freeing actree_nodes_buf ... OK This indicates that PDict() needs to allocate a temporary buffer (the actree_nodes_buf C variable) of about 4GB to build the Aho-Corasick tree. This buffer has exactly the same size as an integer vector of length 1008246640. Can you allocate such vector? Try: > x <- integer(1008246640) Given that you have 20GB of RAM, this should work, unless something is wrong with your R installation... More about the "fixed-size temporary buffer" approach: The size of this buffer is chosen in a way so that it is guaranteed to be big enough to store the entire Aho-Corasick tree with no need of reallocation. It may be that the real size of this tree will in fact be smaller (sometimes much smaller) than the size of the temporary buffer but AFAICS there is no easy way to know this in advance. The real size of the tree (in bytes) can be obtained with: > length([EMAIL PROTECTED]) * 32 Note that the formula used to compute the size of the buffer only depends on the length and width of the input dictionary and that this formula is an optimal a priori estimate in the sense that it is possible that the tree will fill up the temp buffer entirely. We chose to use a fixed-size temporary buffer for the construction of the AC tree because we wanted to make PDict() as fast as possible at the cost of some increased memory requirement. The current approach is not written in stone though and we might change this in the future. Maybe a better approach would be to do some sort of compromise by choosing a buffer size that is 50% of the best a priori estimate and do 1 reallocation if the temp buffer happens to be too small with the hope that this will be a rare situation when using real-world data. But more expertise will be needed before we can choose the good ratio (50% ? 25% ? 75% ?...) Cheers, H. Quoting "Joseph Dhahbi, P.h.D." <[EMAIL PROTECTED]>: > Hello > I need help on how to get around the memory error reported below, > especially when I can not add anymore RAM: > Here is the Hardware Overview: > Model Name: Mac Pro > Model Identifier: MacPro1,1 > Processor Name: Dual-Core Intel Xeon > Processor Speed: 2.66 GHz > Number Of Processors: 2 > Total Number Of Cores: 4 > L2 Cache (per processor): 4 MB > Memory: 20 GB > Bus Speed: 1.33 GHz > Boot ROM Version: MP11.005C.B08 > SMC Version: 1.7f10 > Serial Number: G87052SGUPZ > > > >> NM_seq=readSolexaFastA(NM_fa) >> NM_alf=alphabetFrequency(NM_seq, baseOnly=TRUE) >> NM_seq_clean = NM_seq[NM_alf[,"other"]==0] >> length(NM_seq) > [1] 4820218 >> length(NM_seq_clean) > [1] 4817537 >> NM_seq_clean > A DNAStringSet instance of length 4817537 > width seq > [1] 36 GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAT > [2] 36 GTGGTAATTCATCAGATCTCGGATGGCATTGGTCAT > [3] 36 GGGAGGTCACTAATGGAGACACACAGAAATGTAACA > [4] 36 GGGATTGGTTTTTTGTTACTGATTTGTTTGAGTTCA > [5] 36 GTGGTAATTTTGACTTTTTAGGTTAATTTATTTTTT > [6] 36 GATCGGAAGGAGCTCGTATGCCGTCTTCTGCTTAGA > [7] 36 GGTCAGTTGTGTTCTCCTGAGTAGGTTGTGTGAATG > [8] 36 GGGAGGTCACTAATGGAGACACACAGAAATGTAACA > [9] 36 GGGAGGCTGAGGCAGGAGAATGGCATGAACCTAGAT > ... ... ... > [4817529] 36 TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAG > [4817530] 36 CATCAATGTATCTTAAGGCGTAAATTGTAAGCGTTA > [4817531] 36 CGAGCAGCGACGCATCACCCAGCTAGATCGGAAGAG > [4817532] 36 GCAATGCCACTGGCGCGACAACCGGGACACCATAGG > [4817533] 36 CCTCGCCGGACACGCTGAACTTGTGGCCGTTTTCGT > [4817534] 36 CCATTGTACAACGTATCGACATATCCTCCACCCGCC > [4817535] 36 CCCCCTGAACCTGAAACATAAAATGAATGCAATTGT > [4817536] 36 ACCATGTTGTCCAAGGGCGAATTCTGCAGATATCCA > [4817537] 36 CAGGGGCCGGCGGCTGGCTAGGGCTGCAGCGTTAAA > >> NM_seq_pDict=PDict(NM_seq_clean) > Error in .PDict(dict, names(dict), tb.start, tb.end, drop.head, drop.tail, : > alloc_actree_nodes_buf(): failed to alloc actree_nodes_buf > R(433,0xa000d000) malloc: *** vm_allocate(size=4032987136) failed > (error code=3) > R(433,0xa000d000) malloc: *** error: can't allocate region > R(433,0xa000d000) malloc: *** set a breakpoint in szone_error to debug > >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-apple-darwin8.10.1 > > locale: > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiostringsCinterfaceDemo_0.1.2 Biostrings_2.8.9 > Biobase_2.0.1 > > > > > Regards, > Joseph > > Joseph M. Dhahbi, PhD > Childrens Hospital Oakland Research Institute > 5700 Martin Luther King Jr. Way > Oakland, CA 94609 > USA > Ph.(510)428-3885 EXT.5743 > Cell.(702)335-0795 > Fax (510)450-7910 > [EMAIL PROTECTED] > The email message (and any attachments) is for the sol...{{dropped:21}} _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
