Hi Mike, (I CC'ed this to the mailing list).
Ray can be utilized to classify k-mers in a taxonomy. To do so, Ray needs a taxonomy. You can use anything for the taxonomy. At our center, we are using Greengenes and NCBI. See these documents for general documentation about graph coloring and taxonomic profiling features (called Ray Communities): - Documentation/Taxonomy.txt - Documentation/BiologicalAbundances.txt To download the NCBI taxonomy and generate required files: Get a copy of ray: git clone git://github.com/sebhtml/ray.git Add this to your PATH: export PATH=~/git-clones/ray/scripts/NCBI-Taxonomy/:$PATH Then, run this: CreateRayInputStructures.sh This will generate these files: - NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes - NCBI-taxonomy/Genome-to-Taxon.tsv - NCBI-taxonomy/TreeOfLife-Edges.tsv - NCBI-taxonomy/Taxon-Names.tsv Now, you can run Ray as usual (including Ray Méta plugins), but with additional options to run Ray Communities plugins as well: mpiexec -n 96 \ Ray \ -k 31 -o Ray-Communities \ -p SeqA_1.fastq SeqA_2.fastq \ -p SeqB_1.fastq SeqB_2.fastq \ -search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes \ -with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv \ NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv As usual, you can also put all the arguments in a configuration file like this: mpiexec -n 96 Ray Ray.conf where Ray.conf contains -k 31 -o Ray-Communities -p SeqA_1.fastq SeqA_2.fastq -p SeqB_1.fastq SeqB_2.fastq -search NCBI-taxonomy/NCBI-Finished-Bacterial-Genomes -with-taxonomy NCBI-taxonomy/Genome-to-Taxon.tsv NCBI-taxonomy/TreeOfLife-Edges.tsv NCBI-taxonomy/Taxon-Names.tsv So basically, the whole thing does a distributed de Bruijn graph really fast (plugins for the distributed storage engine), assembles de novo the data by distributed graph traversals (Ray Méta; plugin SeedExtender), colors the graph with the reference genomes provided with the -search option (Ray Communities, plugin Searcher), and computes taxonomic profiles using the provided taxonomy (Ray Communities, -with-taxonomy, plugin PhylogenyViewer). All that stuff is heavily distributed -- each Ray process has 32768 user-space threads (workers) and you can throw as many Ray processes as you want to. If you are running Ray on a buggy network (we had problems with Mellanox Infiniband MT26428, revision a0), you can turn on virtual communications too. Cheers, Sébastien On 19/09/12 08:23 PM, Mike Peabody wrote: > Thanks Sébastien! > > -Mike > > ----- Original Message ----- > From: "Sébastien Boisvert" <sebastien.boisver...@ulaval.ca> > To: "Mike Peabody" <m...@sfu.ca> > Sent: Wednesday, September 19, 2012 6:46:19 AM > Subject: Re: RE : MetaRay inquiry > > Hi, > > I should be done today I guess. > > On Monday, we had a deadline for the Genome Canada bioinformatics competition. > > Basically, the script will fetch all the finished bacterial genomes > and all the draft bacterial genomes and create a bunch of symbolic links. > > Each of these fasta files will already contain a >gi|something to classify > it in the NCBI taxonomy. > > For the NCBI taxonomy,there will be 3 files: > > -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv > > > I added the script in > https://github.com/sebhtml/ray/tree/master/scripts/NCBI-Taxonomy > > You can get it with "git clone git://github.com/sebhtml/ray.git" > > The documentation is in Documentation/NCBI-Taxonomy.txt > > It is not complete yet though. I need to add some code to format the tree and > taxon names. > > I will let you know once I have finished and tested everything. > > > On 19/09/12 01:50 AM, Mike Peabody wrote: >> Hi Sébastien, >> >> Just wanted to see how the script was going. >> >> Cheers, >> Mike >> >> ----- Original Message ----- >> From: "Sébastien Boisvert" <sebastien.boisver...@ulaval.ca> >> To: "Mike Peabody" <m...@sfu.ca> >> Sent: Thursday, September 13, 2012 6:27:28 PM >> Subject: Re: RE : MetaRay inquiry >> >> I will write you a script that downloads the required files and that >> convert them. >> >> I should get back at you by next Tuesday. >> >> >> On 12/09/12 09:23 AM, Mike Peabody wrote: >>> Hi Sébastien, >>> >>> Maybe you can upload the files to filedropper or another similar website? >>> http://www.filedropper.com/ >>> >>> Thanks! >>> Mike >>> >>> ----- Original Message ----- >>> From: "Sébastien Boisvert" <sebastien.boisver...@ulaval.ca> >>> To: "Mike Peabody" <m...@sfu.ca> >>> Sent: Wednesday, September 12, 2012 4:51:46 AM >>> Subject: Re: RE : MetaRay inquiry >>> >>> Hi Mike, >>> >>> The 3 required files for taxonomy profiling are (+ reference genomes) >>> >>> -with-taxonomy \ >>> Genome-to-Taxon.tsv \ >>> TreeOfLife-Edges.tsv \ >>> Taxons.tsv >>> >>> >>> There is the documentation at Documentation/Taxonomy.txt, but >>> it seems that since I wrote the initial version, NCBI has changed (once >>> again !) >>> the file formats on their FTP. >>> >>> >>> The file ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip used to contain >>> these: >>> >>> ncbi.info ncbi.lvl ncbi.map ncbi.tre >>> >>> >>> Now it contains: >>> >>> citations.dmp delnodes.dmp division.dmp gc.prt gencode.dmp merged.dmp >>> names.dmp nodes.dmp readme.txt >>> >>> >>> Maybe I can upload the 3 files (built from NCBI taxonomy as of "Sat Nov 5 >>> 12:57:17 CET 2011") >>> if you provide me a place to do so. >>> >>> >>> Otherwise, I can write a documentation (and a convertor) for the new format >>> of the NCBI dumps. This should be quite straightforward. >>> >>> >>> >>> Anyway, the format of the 3 files is general and not specific to NCBI. In >>> our lab, >>> we use the Greengenes taxonomy (3 files derived from the Greengenes >>> taxonomy) modified >>> to include human sequences too. >>> >>> >>> Cheers, Sébastien >>> >>> >>> >>> On 11/09/12 08:41 PM, Mike Peabody wrote: >>>> Hi Sébastien, >>>> >>>> I've been busy with other things, but have got around to looking at >>>> MetaRay again. I am currently unsure of which files from >>>> ftp://ftp.ncbi.nih.gov/pub/taxonomy/ I should be using with >>>> NewickToEdges.py to generate the files Genome-to-Taxon.tsv, >>>> TreeOfLife-Edges.tsv, and Taxon-Names.tsv >>>> >>>> Would you be able to show me how you generated these three files? >>>> >>>> Thanks, >>>> Mike >>>> >>>> ----- Original Message ----- >>>> From: "Sébastien Boisvert" <sebastien.boisver...@ulaval.ca> >>>> To: "Mike Peabody" <m...@sfu.ca> >>>> Sent: Monday, July 30, 2012 7:12:33 AM >>>> Subject: RE : MetaRay inquiry >>>> >>>> Hi, >>>> >>>> Sorry for the delay, I am in Utah, U.S.A. in a workshop. >>>> >>>> You need to add a few options. >>>> >>>> First, you must tell Ray Communities to color the assembled >>>> graph with known references. >>>> >>>> >>>> -search searchDirectory >>>> Provides a directory containing fasta files to be searched >>>> in the de Bruijn graph. >>>> Biological abundances will be written to >>>> RayOutput/BiologicalAbundances >>>> See Documentation/BiologicalAbundances.txt >>>> >>>> >>>> Then you must also provide a taxonomy. >>>> >>>> -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv >>>> Taxon-Names.tsv >>>> Provides a taxonomy. >>>> Computes and writes detailed taxonomic profiles. >>>> See Documentation/Taxonomy.txt for details. >>>> >>>> >>>> Here is an example of a complete command: >>>> >>>> mpiexec -n 64 Ray \ >>>> -o \ >>>> Assembly \ >>>> -k \ >>>> 31 \ >>>> -p \ >>>> Sample/SRR060139_1.fastq.gz \ >>>> Sample/SRR060139_2.fastq.gz \ >>>> -p \ >>>> Sample/SRR060140_1.fastq.gz \ >>>> Sample/SRR060140_2.fastq.gz \ >>>> -search \ >>>> /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/ARDB \ >>>> -search \ >>>> /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Bacteria-Genomes \ >>>> -search \ >>>> /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/HumanChromosomes \ >>>> -search \ >>>> /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/NCBI-Bacteria_DRAFT >>>> \ >>>> -search \ >>>> /rap/nne-790-ab/genomes/RayKmerSearchStuff/last-build/Viruses-Genomes \ >>>> -with-taxonomy \ >>>> /rap/nne-790-ab/genomes/taxonomy/last-build/Genome-to-Taxon.tsv \ >>>> /rap/nne-790-ab/genomes/taxonomy/last-build/TreeOfLife-Edges.tsv \ >>>> /rap/nne-790-ab/genomes/taxonomy/last-build/Taxon-Names.tsv >>>> >>>> >>>> I can deposit a tar.bz2 containing these extra files required if >>>> you provide a place. >>>> >>>> >>>> Sébastien >>>> >>>>> ________________________________________ >>>>> De : Mike Peabody [m...@sfu.ca] >>>>> Date d'envoi : 26 juillet 2012 19:29 >>>>> À : Sébastien Boisvert >>>>> Objet : Re: MetaRay inquiry >>>>> >>>>> Hi Sébastien, >>>>> >>>>> We are interested in taxonomic profiling. Can you give me an example of >>>>> how to do taxonomic profiling? >>>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> ----- Original Message ----- >>>>> From: "Sébastien Boisvert" <sebastien.boisver...@ulaval.ca> >>>>> To: "Mike Peabody" <m...@sfu.ca> >>>>> Sent: Wednesday, July 25, 2012 6:30:21 PM >>>>> Subject: Re: MetaRay inquiry >>>>> >>>>> Hello, >>>>> >>>>> Our recent work adds Ray Méta and Ray Communities. >>>>> >>>>> Plugins for Ray Méta and Ray Communities analyses are already in Ray. >>>>> >>>>> http://denovoassembler.sourceforge.net/ >>>>> >>>>> >>>>> Do you wish to perform de novo assembly, taxonomic profiling, or another >>>>> metagenomic analyse ? >>>>> >>>>> >>>>> You can readily perform de novo assemblies while taxonomic profiling >>>>> require files (reference sequences, taxonomic tree, taxon names). >>>>> >>>>> >>>>> Sébastien Boisvert >>>>> >>>>> JACQUES CORBEIL a écrit : >>>>>> Mike, >>>>>> >>>>>> My talented grad student will send you our new paper recently submitted >>>>>> on Metaray. Please include it in your comparison. You should received >>>>>> it, including url by Thursday. >>>>>> >>>>>> Jacques >>>>>> ___________________________ >>>>>> >>>>>> Jacques Corbeil Ph. D >>>>>> Professeur >>>>>> Chaire de Recherche du Canada en Génomique Médicale >>>>>> Faculté de Médecine >>>>>> Département de Médecine Moléculaire >>>>>> Université Laval >>>>>> Centre Hospitalier Universitaire de Québec (Pavillon CHUL) >>>>>> Bureau T3-67, 2705 boul. Laurier >>>>>> Québec, G1V 4G2, QC, Canada >>>>>> Tel. 418-656-4141 poste 46423 >>>>>> Télécopieur 418-654-2743 >>>>>> http://genome.ulaval.ca/corbeillab/info >>>>>> >>>>>> >>>>>> *Avis relatif à la confidentialité* >>>>>> ** >>>>>> Ce message contient des renseignements qui peuvent être confidentiels ou >>>>>> protégés. Il s'adresse au destinataire prévu ou à une personne autorisée >>>>>> à le recevoir en son nom. Si vous l'avez reçu par erreur, nous vous >>>>>> prions d'en informer l'auteur dans les meilleurs délais, de ne pas >>>>>> divulguer son contenu et de le supprimer de votre système.** >>>>>> >>>>>> On 2012-07-25, at 6:29 PM, Mike Peabody <m...@sfu.ca >>>>>> <mailto:m...@sfu.ca>> wrote: >>>>>> >>>>>>> Dear Dr. Corbeil, >>>>>>> >>>>>>> I am a graduate student in Fiona Brinkman's lab and she mentioned that >>>>>>> you have developed MetaRay for metagenomics analysis. We are doing a >>>>>>> comparison of methods and wanted to include it, but I was unable to >>>>>>> find any information on it. Would you be able to send me or point me >>>>>>> to more information on MetaRay? >>>>>>> >>>>>>> Thanks, >>>>>>> Mike Peabody >>>>>>> Bioinformatics Graduate Student >>>>>>> Brinkman Laboratory, MBB Department >>>>>>> Simon Fraser University, Burnaby, B.C., Canada, V5A 1S6 >>>>>>> http://www.pathogenomics.sfu.ca/brinkman/ >>>>>>> 778-782-2061 >>>>>> >>>>> >>>>> >>> >> > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users