Hi, On 02/14/2013 11:52 AM, James Vincent wrote: > Hi Sébastien, > > Thanks very much for your quick and detailed reply. > > I understand the details of proportion calculations and what they > are, but that des not square with the output files. > > The sum of proportions in the file Terms.tsv, for example, is 55. It > is not slightly off from 1. In other GO output files the sum of > proportions is a similarly large number, 50, 60 or more.
The file Terms.tsv contains all levels of depth in the directed acyclic graph of Gene Ontology. If you take a particular depth, you should see something near 100%. Relevant files: $ ls|grep GeneO 0.Profile.GeneOntologyDomain=biological_process.tsv 0.Profile.GeneOntologyDomain=cellular_component.tsv 0.Profile.GeneOntologyDomain=molecular_function.tsv _GeneOntology $ ls _GeneOntology/ biological_process.Depth=0.tsv cellular_component.Depth=4.tsv molecular_function.Depth=1.tsv molecular_function.Depth=8.tsv biological_process.Depth=1.tsv cellular_component.Depth=5.tsv molecular_function.Depth=2.tsv molecular_function.Depth=9.tsv biological_process.Depth=2.tsv cellular_component.Depth=6.tsv molecular_function.Depth=3.tsv Terms.tsv cellular_component.Depth=0.tsv cellular_component.Depth=7.tsv molecular_function.Depth=4.tsv Terms.xml cellular_component.Depth=1.tsv cellular_component.Depth=8.tsv molecular_function.Depth=5.tsv cellular_component.Depth=2.tsv cellular_component.Depth=9.tsv molecular_function.Depth=6.tsv cellular_component.Depth=3.tsv molecular_function.Depth=0.tsv molecular_function.Depth=7.tsv > > The obvious examples is that the first few largest proportion numbers > add up to more than 2. They are all fractions like 0.5, 0.6 and so on. > Is there an error in my run or perhaps my interpretation? > Do you see this behavior if you look at a given depth and not at all the depths at once ? > Merci, > Jim > > > >>> I expected the >>> sum to be 1.0. >> >> Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's >> a little bit less. This is >> because the demultiplexing process is not 100% accurate, but in general it >> really good. > > > > > On Thu, Feb 14, 2013 at 10:59 AM, Sébastien Boisvert > <[email protected]> wrote: >> Hello, >> >> On 02/14/2013 10:13 AM, jjv5 wrote: >>> Hello, >>> >>> I have used ray-meta with -gene-ontology enabled after downloading GO >>> data using the Main.sh script in the git repo. Everything completed >>> fine and produced >>> expected output. >>> >>> The result file Terms.tsv under BiologicalAbundances/_GeneOntology >>> contains proportions for the GO terms encountered. What is this >>> proportion number based on? >> >> For plain genomes (via the -search command), proportion are computed by >> demultiplexing the signal based on uniquely colored kmers. >> >> For taxonomy, the provided taxonomy tree is used to classify each observed >> kmer >> at the vertex in the tree where the earliest common ancestor is found. >> >> For gene ontology, kmer observations are gathered for each ontology term, >> and proportions >> are computed for each depth in the gene ontology directed acyclic graph. >> >>> Proportion of what? >> >> Of k-mers found in the de Bruijn subgraph that was built from the sequence >> reads >> provided to Ray. >> >> For example, if you want a number of bacterial cells, you need to further >> normalize >> by genome length, and so on. >> >>> The sum of the >>> proportion values in this file is some large integer. >> >> In directories in BiologicalAbundances, a file called SequenceAbundances.xml >> contain >> numerous counts. >> >> These large integers are either a number of k-mers, or a number of k-mer >> observations. >> A k-mer observation corresponds to a k-mer occurring 1 time. >> >> So for a life form X, its kmer observations are computed as follows: >> >> 1. Gather the k-mers that are unique (specific) to this life form X; >> 2. Compute a average number of observations (depth) for these objects; >> 3. For life form X, compute the number of matched k-mers in the graph, >> regardless if they are unique (breadth); >> 4. We the number of matched objects (#3.) and average depth (#2.), the >> demultiplexed number of k-mer observations is calculated. >> >>> I expected the >>> sum to be 1.0. >> >> Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's >> a little bit less. This is >> because the demultiplexing process is not 100% accurate, but in general it >> really good. >> >> see http://genomebiology.com/2012/13/12/R122/abstract >> >>> Is there further documentation somewhere? >> >> The documentation lives mainly in >> https://github.com/sebhtml/ray/tree/master/Documentation >> >> For what you are doing, these are relevant: >> >> * >> https://github.com/sebhtml/ray/blob/master/Documentation/BiologicalAbundances.txt >> * https://github.com/sebhtml/ray/blob/master/Documentation/NCBI-Taxonomy.txt >> * https://github.com/sebhtml/ray/blob/master/Documentation/GeneOntology.txt >> * https://github.com/sebhtml/ray/blob/master/Documentation/Taxonomy.txt >> >>> >>> Thanks, >>> Jim >>> >>> P.S. Thanks for making ray available. We like it a great deal. >>> >> >> Thanks ! >> >> It's nice to hear what our end users like (and what they don't like too !). >> >> >> There is a ticket in progress to further increase the accuracy of Ray >> Communities ( >> the solution that tells you what's in your sample) using topology. >> >> https://github.com/sebhtml/ray/issues/133 >> >>> ------------------------------------------------------------------------------ >>> Free Next-Gen Firewall Hardware Offer >>> Buy your Sophos next-gen firewall before the end March 2013 >>> and get the hardware for free! Learn more. >>> http://p.sf.net/sfu/sophos-d2d-feb >>> _______________________________________________ >>> Denovoassembler-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>> >> >> >> ------------------------------------------------------------------------------ >> Free Next-Gen Firewall Hardware Offer >> Buy your Sophos next-gen firewall before the end March 2013 >> and get the hardware for free! Learn more. >> http://p.sf.net/sfu/sophos-d2d-feb >> _______________________________________________ >> Denovoassembler-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
