Hi,

On 02/14/2013 11:52 AM, James Vincent wrote:
> Hi Sébastien,
>
> Thanks very much for your quick and detailed reply.
>
>   I understand the details of proportion calculations and what they
> are, but that des not square with the output files.
>
> The sum of proportions in the file Terms.tsv, for example, is 55. It
> is not slightly off from 1. In other GO output files the sum of
> proportions is a similarly large number, 50, 60 or more.

The file Terms.tsv contains all levels of depth in the directed acyclic graph
of Gene Ontology.

If you take a particular depth, you should see something near 100%.

Relevant files:

$ ls|grep GeneO
0.Profile.GeneOntologyDomain=biological_process.tsv
0.Profile.GeneOntologyDomain=cellular_component.tsv
0.Profile.GeneOntologyDomain=molecular_function.tsv
_GeneOntology

$ ls _GeneOntology/
biological_process.Depth=0.tsv  cellular_component.Depth=4.tsv  
molecular_function.Depth=1.tsv  molecular_function.Depth=8.tsv
biological_process.Depth=1.tsv  cellular_component.Depth=5.tsv  
molecular_function.Depth=2.tsv  molecular_function.Depth=9.tsv
biological_process.Depth=2.tsv  cellular_component.Depth=6.tsv  
molecular_function.Depth=3.tsv  Terms.tsv
cellular_component.Depth=0.tsv  cellular_component.Depth=7.tsv  
molecular_function.Depth=4.tsv  Terms.xml
cellular_component.Depth=1.tsv  cellular_component.Depth=8.tsv  
molecular_function.Depth=5.tsv
cellular_component.Depth=2.tsv  cellular_component.Depth=9.tsv  
molecular_function.Depth=6.tsv
cellular_component.Depth=3.tsv  molecular_function.Depth=0.tsv  
molecular_function.Depth=7.tsv

>
> The obvious examples is that the first few largest proportion numbers
> add up to more than 2. They are all fractions like 0.5, 0.6 and so on.
> Is there an error in my run or perhaps my interpretation?
>

Do you see this behavior if you look at a given depth and not at all the depths 
at once ?

> Merci,
> Jim
>
>
>
>>> I expected the
>>> sum to be 1.0.
>>
>> Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's 
>> a little bit less. This is
>> because the demultiplexing process is not 100% accurate, but in general it 
>> really good.
>
>
>
>
> On Thu, Feb 14, 2013 at 10:59 AM, Sébastien Boisvert
> <[email protected]> wrote:
>> Hello,
>>
>> On 02/14/2013 10:13 AM, jjv5 wrote:
>>> Hello,
>>>
>>> I have used ray-meta with -gene-ontology enabled after downloading GO
>>> data using the Main.sh script in the git repo. Everything completed
>>> fine and produced
>>> expected output.
>>>
>>> The result file Terms.tsv under BiologicalAbundances/_GeneOntology
>>> contains proportions for the GO terms encountered. What is this
>>> proportion number based on?
>>
>> For plain genomes (via the -search command), proportion are computed by
>> demultiplexing the signal based on uniquely colored kmers.
>>
>> For taxonomy, the provided taxonomy tree is used to classify each observed 
>> kmer
>> at the vertex in the tree where the earliest common ancestor is found.
>>
>> For gene ontology, kmer observations are gathered for each ontology term, 
>> and proportions
>> are computed for each depth in the gene ontology directed acyclic graph.
>>
>>> Proportion of what?
>>
>> Of k-mers found in the de Bruijn subgraph that was built from the sequence 
>> reads
>> provided to Ray.
>>
>> For example, if you want a number of bacterial cells, you need to further 
>> normalize
>> by genome length, and so on.
>>
>>> The sum of the
>>> proportion values in this file is some large integer.
>>
>> In directories in BiologicalAbundances, a file called SequenceAbundances.xml 
>> contain
>> numerous counts.
>>
>> These large integers are either a number of k-mers, or a number of k-mer 
>> observations.
>> A k-mer observation corresponds to a k-mer occurring 1 time.
>>
>> So for a life form X, its kmer observations are computed as follows:
>>
>> 1. Gather the k-mers that are unique (specific) to this life form X;
>> 2. Compute a average number of observations (depth) for these objects;
>> 3. For life form X, compute the number of matched k-mers in the graph, 
>> regardless if they are unique (breadth);
>> 4. We the number of matched objects (#3.) and average depth (#2.), the 
>> demultiplexed number of k-mer observations is calculated.
>>
>>> I expected the
>>> sum to be 1.0.
>>
>> Sometimes, it's a little bit more than 1.000 (like 1.00562), sometimes it's 
>> a little bit less. This is
>> because the demultiplexing process is not 100% accurate, but in general it 
>> really good.
>>
>>     see http://genomebiology.com/2012/13/12/R122/abstract
>>
>>> Is there further documentation somewhere?
>>
>> The documentation lives mainly in 
>> https://github.com/sebhtml/ray/tree/master/Documentation
>>
>> For what you are doing, these are relevant:
>>
>> * 
>> https://github.com/sebhtml/ray/blob/master/Documentation/BiologicalAbundances.txt
>> * https://github.com/sebhtml/ray/blob/master/Documentation/NCBI-Taxonomy.txt
>> * https://github.com/sebhtml/ray/blob/master/Documentation/GeneOntology.txt
>> * https://github.com/sebhtml/ray/blob/master/Documentation/Taxonomy.txt
>>
>>>
>>> Thanks,
>>> Jim
>>>
>>> P.S. Thanks for making ray available. We like it a great deal.
>>>
>>
>> Thanks !
>>
>> It's nice to hear what our end users like (and what they don't like too !).
>>
>>
>> There is a ticket in progress to further increase the accuracy of Ray 
>> Communities (
>> the solution that tells you what's in your sample) using topology.
>>
>>       https://github.com/sebhtml/ray/issues/133
>>
>>> ------------------------------------------------------------------------------
>>> Free Next-Gen Firewall Hardware Offer
>>> Buy your Sophos next-gen firewall before the end March 2013
>>> and get the hardware for free! Learn more.
>>> http://p.sf.net/sfu/sophos-d2d-feb
>>> _______________________________________________
>>> Denovoassembler-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Free Next-Gen Firewall Hardware Offer
>> Buy your Sophos next-gen firewall before the end March 2013
>> and get the hardware for free! Learn more.
>> http://p.sf.net/sfu/sophos-d2d-feb
>> _______________________________________________
>> Denovoassembler-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users


------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to