[again, please CC the list] On 11/01/2012 02:48 PM, Christina Boucher wrote: > Hi Adrain and Sebastien, > > Nice to hear from McGill folk since that is my alma mater ;) >
So you are Canadian then ? > I mainly work on plant pathogen data and Arabidopsis. In the future, we plan > on looking at Brassica >Rapa. I mainly use Spades for assembly and for the pathogen data it worked >amazingly well (much, >much better than Velvet). I know that the team behind Spades have actually created a tool to validate their assemblies though I don't remember the name. For Ray, we also have something like that that uses MUMmer as the backend. > However, spades seem to choke on the Arabidopsis data and I just haven't >been able to get anywhere with it. Memory was the issue. The issue is that Spades is not distributed and that it is hard to get access to a machine with a lot of memory. > I tried Allpaths but find that it's too >difficult for my biological collaborators to run. From the GAGE and Assemblathon papers, ALLPATHS-LG yields really nice assemblies. But its ease of use is known to be not very good. > I started looking at alternatives last and came >across a couple blog posts about Ray. I have a decent assembly server (4 >core, 512G RAM). Ray is really good on shared memory systems, but unless you use '--mca maffinity libnuma' on these monstruous 512-GB machines, memory accesses won't be as good as a distributed cluster. Do you have access to a cluster ? If not, you can look at http://www.nccs.gov/ or http://www.ncsa.illinois.edu/ >I usually use Opera for scaffolding and have been generally happy with it. > > Just trying Ray now… I will let you know how it goes.. > > Christina > > > On 2012-11-01, at 11:20 AM, Adrian Platts <plat...@sbcglobal.net > <mailto:plat...@sbcglobal.net>> wrote: > >> Hi Christina >> >> Here at McGill we have assembled to varying degrees depending on the project >> the selfing crucifers Sisymbrium Irio, Leavenworthia Alabamica (recent >> hexaploid) and Aethionema Arabicum. >> We've also worked on assembly of self incompatible species which tend to be >> highly heterozygous including Capsella bursa pastoris, Capsella Grandiflora >> and a Brassica Rapa >> ecotype ... and are kind of involved in the assembly of a couple of other >> non-crucifers including bean and Cleome. >> >> We only have a small assembly capacity here (3 80 core, 256 GB RAM boxes) >> but I'm keen to compare notes on where things are and are not working well >> in plant assemblies! >> >> We started by using a long Kmer (K>61) approach in Velvet but found the >> chimerism rate around TEs was worryingly high. We then moved on to >> AllpathsLG and Ray-SOAPdeNovo >> (Ray for contiging, Soap for scaffolding). We're also using the meraculous >> assembler for heterozygotes. As I say - I'd be very interested in hearing >> your experiences. >> >> Adrian >> Adrian Platts >> VEGI Project >> McGill >> >> ps. these are our latencies with various params... after some checking with >> Sebastien we're keeping away from tcp based messaging. >> >> mpiexec -n 10 --mca btl sm,self --bind-to-core --bycore --mca maffinity >> libnuma ./Ray -o foo3 -test-network-only >> # AverageForAllRanks: 7 >> >> mpiexec -n 40 --mca btl sm,self --bind-to-core --bycore --mca maffinity >> libnuma ./Ray -o foo4 -test-network-only >> # AverageForAllRanks: 18.15 >> >> mpiexec -n 32 --mca btl sm,self --bind-to-core --bycore --mca maffinity >> libnuma ./Ray -o foo4 -route-messages -test-network-only >> # AverageForAllRanks: 46.875 >> (round-robin) >> >> mpiexec -n 40 --mca btl sm,self --bind-to-core --bycore ./Ray -o foo3 >> -test-network-only >> # AverageForAllRanks: 17.9 >> >> mpiexec -n 40 --mca btl sm,self ./Ray -o foo3 -test-network-only >> # AverageForAllRanks: 22.425 >> >> mpiexec -n 41 --mca btl sm,self ./Ray -o foo3 -test-network-only >> # AverageForAllRanks: 45.6098 >> >> mpiexec -n 70 --mca btl sm,self ./Ray -o foo3 -test-network-only >> # AverageForAllRanks: 82.105 >> >> >> >> On Nov 1, 2012, at 1:07 PM, Sébastien Boisvert >> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>> >> wrote: >> >>> Hello, >>> >>> You should CC the mailing list as I am sure that numerous people in >>> the genomics community would be interested by plant genome de novo assembly >>> ! >>> >>> People at McGill University did some work on plant genomes with Ray too. >>> They posted their results on the list I think. >>> >>> So you have something like 1 500 000 000 sequences, right ? >>> >>> On what kind of hardware are you running ? >>> >>> What's the latency reported in NetworkTest.txt ? >>> >>> >>> -- >>> Sent from my IBM Blue Gene/Q >>> >>> Sébastien >>> >>> >>> >>> On 11/01/2012 12:57 PM, Christina Boucher wrote: >>>> Thanks. I ended up attaching the .openmpi-setup file in my top-level home >>>> directory and then adding the following line to my .bashrc: source >>>> ~/.openmpi-setup >>>> >>>> After recompiling it seems to be running on my Arabidopsis data. I am >>>> trying it with all 4 lanes and hoping that it works. I don't necessarily >>>> care if I get the *best* assembly but an assembly would be nice. Other >>>> assemblers have been bailing on memory with my 512G server but I am >>>> hopeful about your program. >>>> >>>> Thanks. >>>> >>>> Best, >>>> Christina >>>> >>>> >>>> >>>> >>>> >>>> On 2012-10-31, at 3:28 PM, Sébastien Boisvert >>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> >>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote: >>>> >>>>> Hello, >>>>> >>>>> On 10/31/2012 03:48 PM, Christina Boucher wrote: >>>>>> MPI is already installed on my server… see: >>>>>> oak # rpm -qa | grep openmpi >>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64 >>>>>> openmpi-1.5.4-5.fc17.1.x86_64 >>>>>> >>>>> >>>>> This is something specific to Fedora 17 (which I happen to be using on my >>>>> laptop). >>>>> My answer below is not really related to Ray, but more related to Fedora >>>>> 17. >>>>> >>>>> $ repoquery --list openmpi-1.5.4-5.fc17.1.x86_64 | grep mpiexec$ |grep bin >>>>> /usr/lib64/openmpi/bin/mpiexec >>>>> >>>>> $ repoquery --list openmpi-devel-1.5.4-5.fc17.1.x86_64 | grep mpicxx$ | >>>>> grep bin >>>>> /usr/lib64/openmpi/bin/mpicxx >>>>> >>>>> >>>>> However, the default PATH for a user on Fedora 17 is: >>>>> >>>>> [test@panic ~]$ echo $PATH >>>>> /usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/home/test/.local/bin:/home/test/bin >>>>> >>>>> >>>>> You can fix this in Fedora 17 by adding the following 2 lines to your >>>>> $HOME/.bashrc: >>>>> >>>>> export PATH=/usr/lib64/openmpi/bin:$PATH >>>>> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib/:$LD_LIBRARY_PATH >>>>> >>>>> >>>>> Let me know if that work for you. >>>>> >>>>> >>>>> Sébastien >>>>> >>>>>> oak # rpm -qa | grep openmpi >>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64 >>>>>> openmpi-1.5.4-5.fc17.1.x86_64 >>>>>> >>>>>> Still the installation problems persist…. >>>>>> >>>>>> Christina >>>>>> >>>>>> >>>>>> >>>>>> On 2012-10-31, at 11:26 AM, Sébastien Boisvert >>>>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> >>>>>> <mailto:sebastien.boisver...@ulaval.ca> >>>>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>>> make[1]: mpicxx: Command not found >>>>>>> >>>>>>> To install Ray, you need an MPI library. You don't have one installed. >>>>>>> >>>>>>> For example, on Fedora, the packages are openmpi, openmpi-devel, >>>>>>> gcc-c++. >>>>>>> >>>>>>>> In addition, is the the max kmer length 32? Most people are using >>>>>>>> upwards to 55….? >>>>>>> >>>>>>> The maximum k-mer length is set at compilation. The default is >>>>>>> MAXKMERLENGTH=32. >>>>>>> To change that: >>>>>>> >>>>>>> make MAXKMERLENGTH=64 >>>>>>> >>>>>>> >>>>>>> Sébastien >>>>>>> >>>>>>> On 10/31/2012 12:50 PM, Christina Boucher wrote: >>>>>>>>>> >>>>>>>>>> I am trying to use your Ray assembler. I've been using Spades >>>>>>>>>> (mainly because I am formerly >>>>>>>>>> from Pavel Pevzner's lab) but running out of memory on a large >>>>>>>>>> dataset. >>>>>>>>> >>>>>>>>> Maybe processing your large dataset is more amenable with a >>>>>>>>> distributed assembler. >>>>>>>> >>>>>>>> Spades group released a new version yesterday that's supposed to use >>>>>>>> less memory. I am trying that and the Ray assembler. >>>>>>>> >>>>>>>> >>>>>>>>> To get it and install it: >>>>>>>>> >>>>>>>>> $ wget >>>>>>>>> http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2 >>>>>>>>> $ sha1sum Ray-v2.1.0.tar.bz2 >>>>>>>>> 4c09f2731445852857af53b65aa47e444792eeb0 Ray-v2.1.0.tar.bz2 >>>>>>>>> >>>>>>>>> $ tar xjf Ray-v2.1.0.tar.bz2 >>>>>>>>> $ cd Ray-v2.1.0/ >>>>>>>>> $ make >>>>>>>> >>>>>>>> >>>>>>>> The problem is this compilation error: After those steps I get the >>>>>>>> following error: >>>>>>>> >>>>>>>> eggs:~/Ray-v2.1.0$ make >>>>>>>> >>>>>>>> Compilation options (you can change them of course) >>>>>>>> >>>>>>>> PREFIX = install-prefix >>>>>>>> MAXKMERLENGTH = 32 >>>>>>>> FORCE_PACKING = n >>>>>>>> ASSERT = n >>>>>>>> HAVE_LIBZ = n >>>>>>>> HAVE_LIBBZ2 = n >>>>>>>> INTEL_COMPILER = n >>>>>>>> MPICXX = mpicxx >>>>>>>> GPROF = n >>>>>>>> OPTIMIZE = y >>>>>>>> DEBUG = n >>>>>>>> >>>>>>>> Compilation and linking flags (generated automatically) >>>>>>>> >>>>>>>> CXXFLAGS = -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D >>>>>>>> RAY_VERSION=\"2.1.0\" >>>>>>>> LDFLAGS = >>>>>>>> >>>>>>>> make[1]: Entering directory >>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform' >>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D >>>>>>>> RAY_VERSION=\"2.1.0\" -D RAYPLATFORM_VERSION=\"1.1.0\" -I. -c -o >>>>>>>> memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp >>>>>>>> make[1]: mpicxx: Command not found >>>>>>>> make[1]: *** [memory/ReusableMemoryStore.o] Error 127 >>>>>>>> make[1]: Leaving directory >>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform' >>>>>>>> make[1]: Entering directory `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code' >>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D >>>>>>>> RAY_VERSION=\"2.1.0\" -I ../RayPlatform -I. -c -o >>>>>>>> application_core/ray_main.o application_core/ray_main.cpp >>>>>>>> make[1]: mpicxx: Command not found >>>>>>>> make[1]: *** [application_core/ray_main.o] Error 127 >>>>>>>> make[1]: Leaving directory `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code' >>>>>>>> mpicxx code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o >>>>>>>> Ray >>>>>>>> make: mpicxx: Command not found >>>>>>>> make: *** [Ray] Error 127 >>>>>>>> >>>>>>>> >>>>>>>> Any thoughts? >>>>>>>> >>>>>>>> In addition, is the the max kmer length 32? Most people are using >>>>>>>> upwards to 55….? >>>>>>>> >>>>>>>> Christina >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> $ mpiexec -n 1 ./Ray -version >>>>>>>>> $ mpiexec -n 999 ./Ray -k 31 -p 1.left.fastq 1.right.fastq -p >>>>>>>>> 2.left.fastq 2.right.fastq -o Test >>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Christina >>>>>>>>>> >>>>>>>>>> * >>>>>>>>>> ------------------------------------------------* >>>>>>>>>> *Christina Boucher* >>>>>>>>>> **Department of Computer Science >>>>>>>>>> Colorado State University >>>>>>>>>> Fort Collins, CO 80523 >>>>>>>>>> +1.970.491.8063 >>>>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> >>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> >>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> >>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu> >>>>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com> >>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> >>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> >>>>>>>>>> *------------------------------------------------* >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> * >>>>>>>> ------------------------------------------------* >>>>>>>> *Christina Boucher* >>>>>>>> **Department of Computer Science >>>>>>>> Colorado State University >>>>>>>> Fort Collins, CO 80523 >>>>>>>> +1.970.491.8063 >>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> >>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu> >>>>>>>> <mailto:cbouc...@cs.colostate.edu> >>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com> >>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> >>>>>>>> <http://www.christinaboucher.com> >>>>>>>> *------------------------------------------------* >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *** >>>>>>> Sébastien Boisvert >>>>>>> http://boisvert.info >>>>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64). >>>>>> >>>>>> * >>>>>> ------------------------------------------------* >>>>>> *Christina Boucher* >>>>>> **Department of Computer Science >>>>>> Colorado State University >>>>>> Fort Collins, CO 80523 >>>>>> +1.970.491.8063 >>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> >>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu> >>>>>> www.christinaboucher.com <http://www.christinaboucher.com> >>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> >>>>>> *------------------------------------------------* >>>>>> >>>>> >>>>> >>>>> -- >>>>> *** >>>>> Sébastien Boisvert >>>>> http://boisvert.info >>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64). >>>> >>>> * >>>> ------------------------------------------------* >>>> *Christina Boucher* >>>> **Department of Computer Science >>>> Colorado State University >>>> Fort Collins, CO 80523 >>>> +1.970.491.8063 >>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> >>>> <mailto:cbouc...@cs.colostate.edu> >>>> www.christinaboucher.com <http://www.christinaboucher.com> >>>> <http://www.christinaboucher.com> >>>> *------------------------------------------------* >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Everyone hates slow websites. So do we. >>> Make your web apps faster with AppDynamics >>> Download AppDynamics Lite for free today: >>> http://p.sf.net/sfu/appdyn_sfd2d_oct >>> _______________________________________________ >>> Denovoassembler-users mailing list >>> Denovoassembler-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> > > * > ------------------------------------------------* > *Christina Boucher* > **Department of Computer Science > Colorado State University > Fort Collins, CO 80523 > +1.970.491.8063 > cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> > www.christinaboucher.com <http://www.christinaboucher.com> > *------------------------------------------------* > -- Sent from my IBM Blue Gene/Q ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users