[again, please CC the list]

On 11/01/2012 02:48 PM, Christina Boucher wrote:
> Hi Adrain and Sebastien,
>
> Nice to hear from McGill folk since that is my alma mater ;)
>

So you are Canadian then ?

> I mainly work on plant pathogen data and Arabidopsis.  In the future, we plan 
> on looking at Brassica
>Rapa.  I mainly use Spades for assembly and for the pathogen data it  worked 
>amazingly well (much,
>much better than Velvet).

I know that the team behind Spades have actually created a tool to validate 
their assemblies
though I don't remember the name.

For Ray, we also have something like that that uses MUMmer as the backend.

> However, spades seem to choke on the Arabidopsis data and I just haven't
>been able to get anywhere with it.  Memory was the issue.

The issue is that Spades is not distributed and that it is hard to get access 
to a machine with
a lot of memory.

> I tried Allpaths but find that it's too
>difficult for my biological collaborators to run.

 From the GAGE and Assemblathon papers, ALLPATHS-LG yields really nice 
assemblies.
But its ease of use is known to be not very good.

> I started looking at alternatives last and came
>across a couple blog posts about Ray.   I have a decent assembly server (4 
>core, 512G RAM).

Ray is really good on shared memory systems, but unless you use '--mca 
maffinity libnuma' on
these monstruous 512-GB machines, memory accesses won't be as good as a 
distributed cluster.

Do you have access to a cluster ?

If not, you can look at

  http://www.nccs.gov/
  or
  http://www.ncsa.illinois.edu/
  
>I usually use Opera for scaffolding and have been generally happy with it.
>
> Just trying Ray now… I will let you know how it goes..
>



> Christina
>
>
> On 2012-11-01, at 11:20 AM, Adrian Platts <plat...@sbcglobal.net 
> <mailto:plat...@sbcglobal.net>> wrote:
>
>> Hi Christina
>>
>> Here at McGill we have assembled to varying degrees depending on the project 
>> the selfing crucifers Sisymbrium Irio, Leavenworthia Alabamica (recent 
>> hexaploid) and Aethionema Arabicum.
>> We've also worked on assembly of self incompatible species which tend to be 
>> highly heterozygous including Capsella bursa pastoris, Capsella Grandiflora 
>> and a Brassica Rapa
>> ecotype ... and are kind of involved in the assembly of a couple of other 
>> non-crucifers including bean and Cleome.
>>
>> We only have a small assembly capacity here (3 80 core, 256 GB RAM boxes)  
>> but I'm keen to compare notes on where things are and are not working well 
>> in plant assemblies!
>>
>> We started by using a long Kmer (K>61) approach in Velvet but found the 
>> chimerism rate around TEs was worryingly high.  We then moved on to 
>> AllpathsLG and Ray-SOAPdeNovo
>> (Ray for contiging, Soap for scaffolding).  We're also using the meraculous 
>> assembler for heterozygotes.  As I say - I'd be very interested in hearing 
>> your experiences.
>>
>> Adrian
>> Adrian Platts
>> VEGI Project
>> McGill
>>
>> ps. these are our latencies with various params... after some checking with 
>> Sebastien we're keeping away from tcp based messaging.
>>
>> mpiexec -n 10 --mca btl sm,self  --bind-to-core --bycore --mca maffinity 
>> libnuma ./Ray -o foo3  -test-network-only
>> # AverageForAllRanks: 7
>>
>> mpiexec -n 40 --mca btl sm,self  --bind-to-core --bycore --mca maffinity 
>> libnuma ./Ray -o foo4 -test-network-only
>> # AverageForAllRanks: 18.15
>>
>> mpiexec -n 32 --mca btl sm,self  --bind-to-core --bycore --mca maffinity 
>> libnuma ./Ray -o foo4 -route-messages -test-network-only
>> # AverageForAllRanks: 46.875
>> (round-robin)
>>
>> mpiexec -n 40 --mca btl sm,self  --bind-to-core --bycore ./Ray -o foo3  
>> -test-network-only
>> # AverageForAllRanks: 17.9
>>
>> mpiexec -n 40 --mca btl sm,self  ./Ray -o foo3  -test-network-only
>> # AverageForAllRanks: 22.425
>>
>> mpiexec -n 41 --mca btl sm,self  ./Ray -o foo3  -test-network-only
>> # AverageForAllRanks: 45.6098
>>
>> mpiexec -n 70 --mca btl sm,self  ./Ray -o foo3  -test-network-only
>> # AverageForAllRanks: 82.105
>>
>>
>>
>> On Nov 1, 2012, at 1:07 PM, Sébastien Boisvert 
>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>> 
>> wrote:
>>
>>> Hello,
>>>
>>> You should CC the mailing list as I am sure that numerous people in
>>> the genomics community would be interested by plant genome de novo assembly 
>>> !
>>>
>>> People at McGill University did some work on plant genomes with Ray too.
>>> They posted their results on the list I think.
>>>
>>> So you have something like 1 500 000 000 sequences, right ?
>>>
>>> On what kind of hardware are you running ?
>>>
>>> What's the latency reported in NetworkTest.txt ?
>>>
>>>
>>> --
>>> Sent from my IBM Blue Gene/Q
>>>
>>>            Sébastien
>>>
>>>
>>>
>>> On 11/01/2012 12:57 PM, Christina Boucher wrote:
>>>> Thanks. I ended up attaching the .openmpi-setup file in my top-level home 
>>>> directory and then adding the following line to my .bashrc:  source  
>>>> ~/.openmpi-setup
>>>>
>>>> After recompiling it seems to be running on my Arabidopsis data.  I am 
>>>> trying it with all 4 lanes and hoping that it works.  I don't necessarily 
>>>> care if I get the *best* assembly but an assembly would be nice.  Other 
>>>> assemblers have been bailing on memory with my 512G server but I am 
>>>> hopeful about your program.
>>>>
>>>> Thanks.
>>>>
>>>> Best,
>>>> Christina
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2012-10-31, at 3:28 PM, Sébastien Boisvert 
>>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> 
>>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> On 10/31/2012 03:48 PM, Christina Boucher wrote:
>>>>>> MPI is already installed on my server…  see:
>>>>>> oak # rpm -qa | grep openmpi
>>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64
>>>>>> openmpi-1.5.4-5.fc17.1.x86_64
>>>>>>
>>>>>
>>>>> This is something specific to Fedora 17 (which I happen to be using on my 
>>>>> laptop).
>>>>> My answer below is not really related to Ray, but more related to Fedora 
>>>>> 17.
>>>>>
>>>>> $ repoquery --list openmpi-1.5.4-5.fc17.1.x86_64 | grep mpiexec$ |grep bin
>>>>> /usr/lib64/openmpi/bin/mpiexec
>>>>>
>>>>> $ repoquery --list openmpi-devel-1.5.4-5.fc17.1.x86_64 | grep mpicxx$ | 
>>>>> grep bin
>>>>> /usr/lib64/openmpi/bin/mpicxx
>>>>>
>>>>>
>>>>> However, the default PATH for a user on Fedora 17 is:
>>>>>
>>>>> [test@panic ~]$ echo $PATH
>>>>> /usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/home/test/.local/bin:/home/test/bin
>>>>>
>>>>>
>>>>> You can fix this in Fedora 17 by adding the following 2 lines to your 
>>>>> $HOME/.bashrc:
>>>>>
>>>>> export PATH=/usr/lib64/openmpi/bin:$PATH
>>>>> export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib/:$LD_LIBRARY_PATH
>>>>>
>>>>>
>>>>> Let me know if that work for you.
>>>>>
>>>>>
>>>>>                   Sébastien
>>>>>
>>>>>> oak # rpm -qa | grep openmpi
>>>>>> openmpi-devel-1.5.4-5.fc17.1.x86_64
>>>>>> openmpi-1.5.4-5.fc17.1.x86_64
>>>>>>
>>>>>> Still the installation problems persist….
>>>>>>
>>>>>> Christina
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2012-10-31, at 11:26 AM, Sébastien Boisvert 
>>>>>> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> 
>>>>>> <mailto:sebastien.boisver...@ulaval.ca> 
>>>>>> <mailto:sebastien.boisver...@ulaval.ca>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>
>>>>>>> To install Ray, you need an MPI library. You don't have one installed.
>>>>>>>
>>>>>>> For example, on Fedora, the packages are openmpi, openmpi-devel, 
>>>>>>> gcc-c++.
>>>>>>>
>>>>>>>> In addition, is the the max kmer length 32? Most people are using 
>>>>>>>> upwards to 55….?
>>>>>>>
>>>>>>> The maximum k-mer length is set at compilation. The default is 
>>>>>>> MAXKMERLENGTH=32.
>>>>>>> To change that:
>>>>>>>
>>>>>>> make MAXKMERLENGTH=64
>>>>>>>
>>>>>>>
>>>>>>>                Sébastien
>>>>>>>
>>>>>>> On 10/31/2012 12:50 PM, Christina Boucher wrote:
>>>>>>>>>>
>>>>>>>>>> I am trying to use your Ray assembler.  I've been using Spades 
>>>>>>>>>> (mainly because I am formerly
>>>>>>>>>> from Pavel Pevzner's lab) but running out of memory on a large 
>>>>>>>>>> dataset.
>>>>>>>>>
>>>>>>>>> Maybe processing your large dataset is more amenable with a 
>>>>>>>>> distributed assembler.
>>>>>>>>
>>>>>>>> Spades group released a new version yesterday that's supposed to use 
>>>>>>>> less memory.  I am trying that and the Ray assembler.
>>>>>>>>
>>>>>>>>
>>>>>>>>> To get it and install it:
>>>>>>>>>
>>>>>>>>> $ wget 
>>>>>>>>> http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2
>>>>>>>>> $ sha1sum Ray-v2.1.0.tar.bz2
>>>>>>>>> 4c09f2731445852857af53b65aa47e444792eeb0  Ray-v2.1.0.tar.bz2
>>>>>>>>>
>>>>>>>>> $ tar xjf Ray-v2.1.0.tar.bz2
>>>>>>>>> $ cd Ray-v2.1.0/
>>>>>>>>> $ make
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem is this compilation error:  After those steps I get the 
>>>>>>>> following error:
>>>>>>>>
>>>>>>>> eggs:~/Ray-v2.1.0$ make
>>>>>>>>
>>>>>>>> Compilation options (you can change them of course)
>>>>>>>>
>>>>>>>> PREFIX = install-prefix
>>>>>>>> MAXKMERLENGTH = 32
>>>>>>>> FORCE_PACKING = n
>>>>>>>> ASSERT = n
>>>>>>>> HAVE_LIBZ = n
>>>>>>>> HAVE_LIBBZ2 = n
>>>>>>>> INTEL_COMPILER = n
>>>>>>>> MPICXX = mpicxx
>>>>>>>> GPROF = n
>>>>>>>> OPTIMIZE = y
>>>>>>>> DEBUG = n
>>>>>>>>
>>>>>>>> Compilation and linking flags (generated automatically)
>>>>>>>>
>>>>>>>> CXXFLAGS = -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32 -D 
>>>>>>>> RAY_VERSION=\"2.1.0\"
>>>>>>>> LDFLAGS =
>>>>>>>>
>>>>>>>> make[1]: Entering directory 
>>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform'
>>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
>>>>>>>> RAY_VERSION=\"2.1.0\" -D RAYPLATFORM_VERSION=\"1.1.0\" -I. -c -o 
>>>>>>>> memory/ReusableMemoryStore.o memory/ReusableMemoryStore.cpp
>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>> make[1]: *** [memory/ReusableMemoryStore.o] Error 127
>>>>>>>> make[1]: Leaving directory 
>>>>>>>> `/s/parsons/f/fac/cboucher/Ray-v2.1.0/RayPlatform'
>>>>>>>> make[1]: Entering directory `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code'
>>>>>>>> mpicxx -Wall -std=c++98 -O3 -D MAXKMERLENGTH=32  -D 
>>>>>>>> RAY_VERSION=\"2.1.0\" -I ../RayPlatform -I. -c -o 
>>>>>>>> application_core/ray_main.o application_core/ray_main.cpp
>>>>>>>> make[1]: mpicxx: Command not found
>>>>>>>> make[1]: *** [application_core/ray_main.o] Error 127
>>>>>>>> make[1]: Leaving directory `/s/parsons/f/fac/cboucher/Ray-v2.1.0/code'
>>>>>>>> mpicxx   code/TheRayGenomeAssembler.a RayPlatform/libRayPlatform.a -o 
>>>>>>>> Ray
>>>>>>>> make: mpicxx: Command not found
>>>>>>>> make: *** [Ray] Error 127
>>>>>>>>
>>>>>>>>
>>>>>>>> Any thoughts?
>>>>>>>>
>>>>>>>> In addition, is the the max kmer length 32? Most people are using 
>>>>>>>> upwards to 55….?
>>>>>>>>
>>>>>>>> Christina
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ mpiexec -n 1 ./Ray -version
>>>>>>>>> $ mpiexec -n 999 ./Ray -k 31 -p 1.left.fastq 1.right.fastq -p 
>>>>>>>>> 2.left.fastq 2.right.fastq -o Test
>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Christina
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>> ------------------------------------------------*
>>>>>>>>>> *Christina Boucher*
>>>>>>>>>> **Department of Computer Science
>>>>>>>>>> Colorado State University
>>>>>>>>>> Fort Collins, CO 80523
>>>>>>>>>> +1.970.491.8063
>>>>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> 
>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> 
>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> 
>>>>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com> 
>>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> 
>>>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>>>>>> *------------------------------------------------*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> ------------------------------------------------*
>>>>>>>> *Christina Boucher*
>>>>>>>> **Department of Computer Science
>>>>>>>> Colorado State University
>>>>>>>> Fort Collins, CO 80523
>>>>>>>> +1.970.491.8063
>>>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> 
>>>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu> 
>>>>>>>> <mailto:cbouc...@cs.colostate.edu>
>>>>>>>> www.christinaboucher.com <http://www.christinaboucher.com> 
>>>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com> 
>>>>>>>> <http://www.christinaboucher.com>
>>>>>>>> *------------------------------------------------*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ***
>>>>>>> Sébastien Boisvert
>>>>>>> http://boisvert.info
>>>>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64).
>>>>>>
>>>>>> *
>>>>>> ------------------------------------------------*
>>>>>> *Christina Boucher*
>>>>>> **Department of Computer Science
>>>>>> Colorado State University
>>>>>> Fort Collins, CO 80523
>>>>>> +1.970.491.8063
>>>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> 
>>>>>> <mailto:cbouc...@cs.colostate.edu> <mailto:cbouc...@cs.colostate.edu>
>>>>>> www.christinaboucher.com <http://www.christinaboucher.com> 
>>>>>> <http://www.christinaboucher.com> <http://www.christinaboucher.com>
>>>>>> *------------------------------------------------*
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ***
>>>>> Sébastien Boisvert
>>>>> http://boisvert.info
>>>>> Sent from a PC (Linux panic 3.6.2-4.fc17.x86_64).
>>>>
>>>> *
>>>> ------------------------------------------------*
>>>> *Christina Boucher*
>>>> **Department of Computer Science
>>>> Colorado State University
>>>> Fort Collins, CO 80523
>>>> +1.970.491.8063
>>>> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu> 
>>>> <mailto:cbouc...@cs.colostate.edu>
>>>> www.christinaboucher.com <http://www.christinaboucher.com> 
>>>> <http://www.christinaboucher.com>
>>>> *------------------------------------------------*
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Everyone hates slow websites. So do we.
>>> Make your web apps faster with AppDynamics
>>> Download AppDynamics Lite for free today:
>>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>>> _______________________________________________
>>> Denovoassembler-users mailing list
>>> Denovoassembler-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>
>
> *
> ------------------------------------------------*
> *Christina Boucher*
> **Department of Computer Science
> Colorado State University
> Fort Collins, CO 80523
> +1.970.491.8063
> cbouc...@cs.colostate.edu <mailto:cbouc...@cs.colostate.edu>
> www.christinaboucher.com <http://www.christinaboucher.com>
> *------------------------------------------------*
>


-- 
Sent from my IBM Blue Gene/Q

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to