Hello,

On 02/14/2013 10:13 AM, jjv5 wrote:
> Hello,
>
> I have used ray-meta with -gene-ontology enabled after downloading GO
> data using the Main.sh script in the git repo. Everything completed
> fine and produced
> expected output.
>
> The result file Terms.tsv under BiologicalAbundances/_GeneOntology
> contains proportions for the GO terms encountered. What is this
> proportion number based on?

For plain genomes (via the -search command), proportion are computed by
demultiplexing the signal based on uniquely colored kmers.

For taxonomy, the provided taxonomy tree is used to classify each observed kmer
at the vertex in the tree where the earliest common ancestor is found.

For gene ontology, kmer observations are gathered for each ontology term, and 
proportions
are computed for each depth in the gene ontology directed acyclic graph.

> Proportion of what?

Of k-mers found in the de Bruijn subgraph that was built from the sequence reads
provided to Ray.

For example, if you want a number of bacterial cells, you need to further 
normalize
by genome length, and so on.

> The sum of the
> proportion values in this file is some large integer.

In directories in BiologicalAbundances, a file called SequenceAbundances.xml 
contain
numerous counts.

These large integers are either a number of k-mers, or a number of k-mer 
observations.
A k-mer observation corresponds to a k-mer occurring 1 time.

So for a life form X, its kmer observations are computed as follows:

1. Gather the k-mers that are unique (specific) to this life form X;
2. Compute a average number of observations (depth) for these objects;
3. For life form X, compute the number of matched k-mers in the graph, 
regardless if they are unique (breadth);
4. We the number of matched objects (#3.) and average depth (#2.), the 
demultiplexed number of k-mer observations is calculated.

> I expected the
> sum to be 1.0.

Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's a 
little bit less. This is
because the demultiplexing process is not 100% accurate, but in general it 
really good.

   see http://genomebiology.com/2012/13/12/R122/abstract

> Is there further documentation somewhere?

The documentation lives mainly in 
https://github.com/sebhtml/ray/tree/master/Documentation

For what you are doing, these are relevant:

* 
https://github.com/sebhtml/ray/blob/master/Documentation/BiologicalAbundances.txt
* https://github.com/sebhtml/ray/blob/master/Documentation/NCBI-Taxonomy.txt
* https://github.com/sebhtml/ray/blob/master/Documentation/GeneOntology.txt
* https://github.com/sebhtml/ray/blob/master/Documentation/Taxonomy.txt

>
> Thanks,
> Jim
>
> P.S. Thanks for making ray available. We like it a great deal.
>

Thanks !

It's nice to hear what our end users like (and what they don't like too !).


There is a ticket in progress to further increase the accuracy of Ray 
Communities (
the solution that tells you what's in your sample) using topology.

     https://github.com/sebhtml/ray/issues/133

> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> Denovoassembler-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>


------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to