> [again, please CC the list]
Sorry, I'll make sure I do that.
>
> On 11/01/2012 02:48 PM, Christina Boucher wrote:
>> Hi Adrain and Sebastien,
>>
>> Nice to hear from McGill folk since that is my alma mater ;)
>>
>
> So you are Canadian then ?
I am ;)
>> However, spades seem to choke on the Arabidopsis data and I just haven't
>> been able to get anywhere with it. Memory was the issue.
>
> The issue is that Spades is not distributed and that it is hard to get access
> to a machine with
> a lot of memory.
Right. That and I think their data structures require a significant larger
amount of memory but I am not 100% positive on that. The accuracy of the
assemblies has been really impressive.
>
>> I tried Allpaths but find that it's too
>> difficult for my biological collaborators to run.
>
> From the GAGE and Assemblathon papers, ALLPATHS-LG yields really nice
> assemblies.
> But its ease of use is known to be not very good.
Exactly. ALLPATHS is a bit too involved for many users.
>
>> I started looking at alternatives last and came
>> across a couple blog posts about Ray. I have a decent assembly server (4
>> core, 512G RAM).
>
> Ray is really good on shared memory systems, but unless you use '--mca
> maffinity libnuma' on
> these monstruous 512-GB machines, memory accesses won't be as good as a
> distributed cluster.
>
> Do you have access to a cluster ?
I do. I can try it on the cluster or I can try the --mca command line option.
I'll try the second option first.
Christina
>
> If not, you can look at
>
> http://www.nccs.gov/
> or
> http://www.ncsa.illinois.edu/
>
>> I usually use Opera for scaffolding and have been generally happy with it.
>>
>> Just trying Ray now… I will let you know how it goes..
>>
>
>
>
>> Christina
>>
>>
>> On 2012-11-01, at 11:20 AM, Adrian Platts <plat...@sbcglobal.net
>> <mailto:plat...@sbcglobal.net>> wrote:
>>
>>> Hi Christina
>>>
>>> Here at McGill we have assembled to varying degrees depending on the
>>> project the selfing crucifers Sisymbrium Irio, Leavenworthia Alabamica
>>> (recent hexaploid) and Aethionema Arabicum.
>>> We've also worked on assembly of self incompatible species which tend to be
>>> highly heterozygous including Capsella bursa pastoris, Capsella Grandiflora
>>> and a Brassica Rapa
>>> ecotype ... and are kind of involved in the assembly of a couple of other
>>> non-crucifers including bean and Cleome.
>>>
>>> We only have a small assembly capacity here (3 80 core, 256 GB RAM boxes)
>>> but I'm keen to compare notes on where things are and are not working well
>>> in plant assemblies!
>>>
>>> We started by using a long Kmer (K>61) approach in Velvet but found the
>>> chimerism rate around TEs was worryingly high. We then moved on to
>>> AllpathsLG and Ray-SOAPdeNovo
>>> (Ray for contiging, Soap for scaffolding). We're also using the meraculous
>>> assembler for heterozygotes. As I say - I'd be very interested in hearing
>>> your experiences.
>>>
>>> Adrian
>>> Adrian Platts
>>> VEGI Project
>>> McGill
>>>
>>> ps. these are our latencies with various params... after some checking with
>>> Sebastien we're keeping away from tcp based messaging.
>>>
>>> mpiexec -n 10 --mca btl sm,self --bind-to-core --bycore --mca maffinity
>>> libnuma ./Ray -o foo3 -test-network-only
>>> # AverageForAllRanks: 7
>>>
>>> mpiexec -n 40 --mca btl sm,self --bind-to-core --bycore --mca maffinity
>>> libnuma ./Ray -o foo4 -test-network-only
>>> # AverageForAllRanks: 18.15
>>>
>>> mpiexec -n 32 --mca btl sm,self --bind-to-core --bycore --mca maffinity
>>> libnuma ./Ray -o foo4 -route-messages -test-network-only
>>> # AverageForAllRanks: 46.875
>>> (round-robin)
>>>
>>> mpiexec -n 40 --mca btl sm,self --bind-to-core --bycore ./Ray -o foo3
>>> -test-network-only
>>> # AverageForAllRanks: 17.9
>>>
>>> mpiexec -n 40 --mca btl sm,self ./Ray -o foo3 -test-network-only
>>> # AverageForAllRanks: 22.425
>>>
>>> mpiexec -n 41 --mca btl sm,self ./Ray -o foo3 -test-network-only
>>> # AverageForAllRanks: 45.6098
>>>
>>> mpiexec -n 70 --mca btl sm,self ./Ray -o foo3 -test-network-only
>>> # AverageForAllRanks: 82.105
>>>
>>>
>>>
>>> On Nov 1, 2012, at 1:07 PM, Sébastien Boisvert
>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> You should CC the mailing list as I am sure that numerous people in
>>>> the genomics community would be interested by plant genome de novo
>>>> assembly !
>>>>
>>>> People at McGill University did some work on plant genomes with Ray too.
>>>> They posted their results on the list I think.
>>>>
>>>> So you have something like 1 500 000 000 sequences, right ?
>>>>
>>>> On what kind of hardware are you running ?
>>>>
>>>> What's the latency reported in NetworkTest.txt ?
>>>>
>>>>
>>>> --
>>>> Sent from my IBM Blue Gene/Q
>>>>
>>>> Sébastien
>>>>
>>>>
>>>>
>>>> On 11/01/2012 12:57 PM, Christina Boucher wrote:
>>>>> Thanks. I ended up attaching the .openmpi-setup file in my top-level home
>>>>> directory and then adding the following line to my .bashrc: source
>>>>> ~/.openmpi-setup
>>>>>
>>>>> After recompiling it seems to be running on my Arabidopsis data. I am
>>>>> trying it with all 4 lanes and hoping that it works. I don't necessarily
>>>>> care if I get the *best* assembly but an assembly would be nice. Other
>>>>> assemblers have been bailing on memory with my 512G server but I am
>>>>> hopeful about your program.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Best,
>>>>> Christina
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2012-10-31, at 3:28 PM, Sébastien Boisvert
>>>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>
>>>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> On 10/31/2012 03:48 PM, Christina Boucher wrote:
>>>>>>> MPI is already installed on my server… see:
>>>>>>> oak # rpm -qa | grep openmpi
>>>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64
>>>>>>> openmpi-1.5.4-5.fc17.1.x86_64
>>>>>>>
>>>>>>
>>>>>> This is something specific to Fedora 17 (which I happen to be using on
>>>>>> my laptop).
>>>>>> My answer below is not really related to Ray, but more related to Fedora
>>>>>> 17.
>>>>>>
>>>>>> $ repoquery --list openmpi-1.5.4-5.fc17.1.x86_64 | grep mpiexec$ |grep
>>>>>> bin
>>>>>> /usr/lib64/openmpi/bin/mpiexec
>>>>>>
>>>>>> $ repoquery --list openmpi-devel-1.5.4-5.fc17.1.x86_64 | grep mpicxx$ |
>>>>>> grep bin
>>>>>> /usr/lib64/openmpi/bin/mpicxx
>>>>>>
>>>>>>
>>>>>> However, the default PATH for a user on Fedora 17 is:
>>>>>>
>>>>>> [test@panic ~]$ echo $PATH
>>>>>> /usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/home/test/.local/bin:/home/test/bin
>>>>>>
>>>>>>
>>>>>> You can fix this in Fedora 17 by adding the following 2 lines to your
>>>>>> $HOME/.bashrc:
>>>>>>
>>>>>> export PATH=/usr/lib64/openmpi/bin:$PATH
>>>>>> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib/:$LD_LIBRARY_PATH
>>>>>>
>>>>>>
>>>>>> Let me know if that work for you.
>>>>>>
>>>>>>
>>>>>> Sébastien
>>>>>>
>>>>>>> oak # rpm -qa | grep openmpi
>>>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64
>>>>>>> openmpi-1.5.4-5.fc17.1.x86_64
>>>>>>>
>>>>>>> Still the installation problems persist….
>>>>>>>
>>>>>>> Christina
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2012-10-31, at 11:26 AM, Sébastien Boisvert
>>>>>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>
>>>>>>> <mailto:sebastien.boisver...@ulaval.ca>
>>>>>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>>
>>>>>>>> To install Ray, you need an MPI library. You don't have one installed.
>>>>>>>>
>>>>>>>> For example, on Fedora, the packages are openmpi, openmpi-devel,
>>>>>>>> gcc-c++.
>>>>>>>>
>>>>>>>>> In addition, is the the max kmer length 32? Most people are using
>>>>>>>>> upwards to 55….?
>>>>>>>>
>>>>>>>> The maximum k-mer length is set at compilation. The default is
>>>>>>>> MAXKMERLENGTH=32.
>>>>>>>> To change that:
>>>>>>>>
>>>>>>>> make MAXKMERLENGTH=64
>>>>>>>>
>>>>>>>>
>>>>>>>> Sébastien
>>>>>>>>
>>>>>>>> On 10/31/2012 12:50 PM, Christina Boucher wrote:
>>>>>>>>>>>
>>>>>>>>>>> I am trying to use your Ray assembler. I've been using Spades
>>>>>>>>>>> (mainly because I am formerly
>>>>>>>>>>> from Pavel Pevzner's lab) but running out of memory on a large
>>>>>>>>>>> dataset.
>>>>>>>>>>
>>>>>>>>>> Maybe processing your large dataset is more amenable with a
>>>>>>>>>> distributed assembler.
>>>>>>>>>
>>>>>>>>> Spades group released a new version yesterday that's supposed to use
>>>>>>>>> less memory. I am trying that and the Ray assembler.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> To get it and install it:
>>>>>>>>>>
>>>>>>>>>> $ wget
>>>>>>>>>> http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2
>>>>>>>>>> $ sha1sum Ray-v2.1.0.tar.bz2
>>>>>>>>>> 4c09f2731445852857af53b65aa47e444792eeb0 Ray-v2.1.0.tar.bz2
>>>>>>>>>>
>>>>>>>>>> $ tar xjf Ray-v2.1.0.tar.bz2
>>>>>>>>>> $ cd Ray-v2.1.0/
>>>>>>>>>> $ make
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem is this compilation error: After those steps I get the
>>>>>>>>> following error:
>>>>>>>>>
>>>>>>>>> eggs:~/Ray-v2.1.0$ make
>>>>>>>>>
>>>>>>>>> Compilation options (you can change them of course)
>>>>>>>>>
>>>>>>>>> PREFIX = install-prefix
>>>>>>>>> MAXKMERLENGTH = 32
>>>>>>>>> FORCE_PACKING = n
>>>>>>>>> ASSERT = n
>>>>>>>>> HAVE_LIBZ = n
>>>>>>>>> HAVE_LIBBZ2 = n
>>>>>>>>> INTEL_COMPILER = n
>>>>>>>>> MPICXX = mpicxx
>>>>>>>>> GPROF = n
>>>>>>>>> OPTIMIZE = y
>>>>>>>>> DEBUG = n
>>>>>>>>>
>>>>>>>>> Compilation and linking flags (generated automatically)
>>>>>>>>>
>>>>>>>>> CXXFLAGS = -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D
>>>>>>>>> RAY_VERSION=\"2.1.0\"
>>>>>>>>> LDFLAGS =
>>>>>>>>>
>>>>>>>>> make[1]: Entering directory
>>>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform'
>>>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D
>>>>>>>>> RAY_VERSION=\"2.1.0\" -D RAYPLATFORM_VERSION=\"1.1.0\" -I. -c -o
>>>>>>>>> memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
>>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>>> make[1]: *** [memory/ReusableMemoryStore.o] Error 127
>>>>>>>>> make[1]: Leaving directory
>>>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform'
>>>>>>>>> make[1]: Entering directory
>>>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code'
>>>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D
>>>>>>>>> RAY_VERSION=\"2.1.0\" -I ../RayPlatform -I. -c -o
>>>>>>>>> application_core/ray_main.o application_core/ray_main.cpp
>>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>>> make[1]: *** [application_core/ray_main.o] Error 127
>>>>>>>>> make[1]: Leaving directory `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code'
>>>>>>>>> mpicxx code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o
>>>>>>>>> Ray
>>>>>>>>> make: mpicxx: Command not found
>>>>>>>>> make: *** [Ray] Error 127
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any thoughts?
>>>>>>>>>
>>>>>>>>> In addition, is the the max kmer length 32? Most people are using
>>>>>>>>> upwards to 55….?
>>>>>>>>>
>>>>>>>>> Christina
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> $ mpiexec -n 1 ./Ray -version
>>>>>>>>>> $ mpiexec -n 999 ./Ray -k 31 -p 1.left.fastq 1.right.fastq -p
>>>>>>>>>> 2.left.fastq 2.right.fastq -o Test
>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Christina
>>>>>>>>>>>
>>>>>>>>>>> *
>>>>>>>>>>> ------------------------------------------------*
>>>>>>>>>>> *Christina Boucher*
>>>>>>>>>>> **Department of Computer Science
>>>>>>>>>>> Colorado State University
>>>>>>>>>>> Fort Collins, CO 80523
>>>>>>>>>>> +1.970.491.8063
>>>>>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com>
>>>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>>>>>>> *------------------------------------------------*
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> ------------------------------------------------*
>>>>>>>>> *Christina Boucher*
>>>>>>>>> **Department of Computer Science
>>>>>>>>> Colorado State University
>>>>>>>>> Fort Collins, CO 80523
>>>>>>>>> +1.970.491.8063
>>>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com>
>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>>>>> <http://www.christinaboucher.com>
>>>>>>>>> *------------------------------------------------*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ***
>>>>>>>> Sébastien Boisvert
>>>>>>>> http://boisvert.info
>>>>>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64).
>>>>>>>
>>>>>>> *
>>>>>>> ------------------------------------------------*
>>>>>>> *Christina Boucher*
>>>>>>> **Department of Computer Science
>>>>>>> Colorado State University
>>>>>>> Fort Collins, CO 80523
>>>>>>> +1.970.491.8063
>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu>
>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com>
>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>>> *------------------------------------------------*
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ***
>>>>>> Sébastien Boisvert
>>>>>> http://boisvert.info
>>>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64).
>>>>>
>>>>> *
>>>>> ------------------------------------------------*
>>>>> *Christina Boucher*
>>>>> **Department of Computer Science
>>>>> Colorado State University
>>>>> Fort Collins, CO 80523
>>>>> +1.970.491.8063
>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>> www.christinaboucher.com <http://www.christinaboucher.com>
>>>>> <http://www.christinaboucher.com>
>>>>> *------------------------------------------------*
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Everyone hates slow websites. So do we.
>>>> Make your web apps faster with AppDynamics
>>>> Download AppDynamics Lite for free today:
>>>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>>>> _______________________________________________
>>>> Denovoassembler-users mailing list
>>>> Denovoassembler-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>
>>
>> *
>> ------------------------------------------------*
>> *Christina Boucher*
>> **Department of Computer Science
>> Colorado State University
>> Fort Collins, CO 80523
>> +1.970.491.8063
>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
>> www.christinaboucher.com <http://www.christinaboucher.com>
>> *------------------------------------------------*
>>
>
>
> --
> Sent from my IBM Blue Gene/Q
------------------------------------------------
Christina Boucher
Department of Computer Science
Colorado State University
Fort Collins, CO 80523
+1.970.491.8063
cbouc...@cs.colostate.edu
www.christinaboucher.com
------------------------------------------------
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users