Hi group, We are developing an algorithm to detect AS (alternative splicing) events using HTAv2 using the aroma.afymetrix framework. The algorithm identifies AS events, guesses their classes (exon cassette, alternative 5', etc.) and run statistics (limma based) to find the significant ones. After running and testing the algorithm, we saw that it is fooled if one of the potential paths is not expressed at all (say in an exon cassette, the exon is not skipped in any of the samples) and the other path has a change in its expression. I can go into the details on why the algorithm does not work in these cases but I think that is not necessary by now. A potential fix to this problem would be to set a filter based on whether a probeset is being expressed or not in the set of samples.
The HTAv2 array includes around 17,000 antigenomic probes that can be used to measure the background for different probe sequences. For example, using these probes is apparent that the GC content (especially if is above 15/25) increases the signal of these probes (that should not have any signal at all). I was planning to implement a sort of DABG (detection above the background) using these probes: depending on the sequence of the probe, we can find a number of antigenomic probes close to it (for example with the same GC content) and provide am empirical p.value to test that the probe is above the background (for example, the returned p.value would be 0.01 if the signal of the probe is above 99% of the antigenomic probes with the same GC content). This p.value for each probe, can be further summarized for the whole probeset combining all the p.values. The pipeline that I would like to implement is the following: Background removal --> Quantile normalization --> DABG for each probeset (for all the probes including antigenomic) I was thinking on the proper way to implement this in the aroma.affymetrix framework. Probably, I can implement it as if it were a summarization algorithms (a median Polish for example). The medianPolish works in chunks to avoid using too much memory. In this case, it is much easier than that since the algorithm that I want to implement is single-sample: for each probe compute the p.value and summarize them for the whole probesets. Do you think that this is a proper way to run the implementation. Thanks in advance for the answer, Angel -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.