Hi group,

We are developing an algorithm to detect AS (alternative splicing) events 
using HTAv2 using the aroma.afymetrix framework. The algorithm identifies 
AS events, guesses their classes (exon cassette, alternative 5', etc.) and 
run statistics (limma based) to find the significant ones.
After running and testing the algorithm, we saw that it is fooled if one of 
the potential paths is not expressed at all (say in an exon cassette, the 
exon is not skipped in any of the samples) and the other path has a change 
in its expression. I can go into the details on why the algorithm does not 
work in these cases but I think that is not necessary by now.
A potential fix to this problem would be to set a filter based on whether a 
probeset is being expressed or not in the set of samples.

The HTAv2 array includes around 17,000 antigenomic probes that can be used 
to measure the background for different probe sequences. For example, using 
these probes is apparent that the GC content (especially if is above 15/25) 
increases the signal of these probes (that should not have any signal at 
all).

I was planning to implement a sort of DABG (detection above the background) 
using these probes: depending on the sequence of the probe, we can find a 
number of antigenomic probes close to it (for example with the same GC 
content) and provide am empirical p.value to test that the probe is above 
the background (for example, the returned p.value would be 0.01 if the 
signal of the probe is above 99% of the antigenomic probes with the same GC 
content). This p.value for each probe, can be further summarized for the 
whole probeset combining all the p.values.

The pipeline that I would like to implement is the following:

Background removal --> Quantile normalization --> DABG for each probeset 
(for all the probes 
including antigenomic)

I was thinking on the proper way to implement this in the aroma.affymetrix 
framework. Probably, I can implement it as if it were a summarization 
algorithms (a median Polish for example). The medianPolish works in chunks 
to avoid using too much memory. In this case, it is much easier than that 
since the algorithm that I want to implement is single-sample: for each 
probe compute the p.value and summarize them for the whole probesets.

Do you think that this is a proper way to run the implementation.
Thanks in advance for the answer,

Angel

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to