Hello, On 02/14/2013 10:13 AM, jjv5 wrote: > Hello, > > I have used ray-meta with -gene-ontology enabled after downloading GO > data using the Main.sh script in the git repo. Everything completed > fine and produced > expected output. > > The result file Terms.tsv under BiologicalAbundances/_GeneOntology > contains proportions for the GO terms encountered. What is this > proportion number based on?
For plain genomes (via the -search command), proportion are computed by demultiplexing the signal based on uniquely colored kmers. For taxonomy, the provided taxonomy tree is used to classify each observed kmer at the vertex in the tree where the earliest common ancestor is found. For gene ontology, kmer observations are gathered for each ontology term, and proportions are computed for each depth in the gene ontology directed acyclic graph. > Proportion of what? Of k-mers found in the de Bruijn subgraph that was built from the sequence reads provided to Ray. For example, if you want a number of bacterial cells, you need to further normalize by genome length, and so on. > The sum of the > proportion values in this file is some large integer. In directories in BiologicalAbundances, a file called SequenceAbundances.xml contain numerous counts. These large integers are either a number of k-mers, or a number of k-mer observations. A k-mer observation corresponds to a k-mer occurring 1 time. So for a life form X, its kmer observations are computed as follows: 1. Gather the k-mers that are unique (specific) to this life form X; 2. Compute a average number of observations (depth) for these objects; 3. For life form X, compute the number of matched k-mers in the graph, regardless if they are unique (breadth); 4. We the number of matched objects (#3.) and average depth (#2.), the demultiplexed number of k-mer observations is calculated. > I expected the > sum to be 1.0. Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's a little bit less. This is because the demultiplexing process is not 100% accurate, but in general it really good. see http://genomebiology.com/2012/13/12/R122/abstract > Is there further documentation somewhere? The documentation lives mainly in https://github.com/sebhtml/ray/tree/master/Documentation For what you are doing, these are relevant: * https://github.com/sebhtml/ray/blob/master/Documentation/BiologicalAbundances.txt * https://github.com/sebhtml/ray/blob/master/Documentation/NCBI-Taxonomy.txt * https://github.com/sebhtml/ray/blob/master/Documentation/GeneOntology.txt * https://github.com/sebhtml/ray/blob/master/Documentation/Taxonomy.txt > > Thanks, > Jim > > P.S. Thanks for making ray available. We like it a great deal. > Thanks ! It's nice to hear what our end users like (and what they don't like too !). There is a ticket in progress to further increase the accuracy of Ray Communities ( the solution that tells you what's in your sample) using topology. https://github.com/sebhtml/ray/issues/133 > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > Denovoassembler-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
