Hi Michael,

Recently I have submitted to BioC a library called rnaSeqMap, which includes the code we use locally
for problems like yours and similar.
It has the implementation of Aumann-Lindell algorithm for finding irreducible regions that perhaps may be used here. It is not a purely statistical solution, it is more of data mining, but I would look for the minimal level of coverage (mi), for which both your regions are irreducible - in order to compare
them on one sample.
That's the outline of the receipe:

1. Take one region, create SeqReads object on it. rs <- newSeqReads(chr, st, en, strand) 2. Fill the object with data using rs <- addDataToReadset() if you have the reads data in your workspace, or use addExperimentsToReadset() if you are connected to xmap database with the seq_reads table added. 3. Run nd <- getCoverageFromRS(rs, 1) to get the NucleotideDistr object with coverage. 4. Then - in some sort of adaptive procedure do: nd.reg <- findRegionsAsND(nd, mi, minsup) to get the minimal mi, for which distr(nd.reg) is filled with mi on all the positions.

Then repeat the procedure for both regions and compare mi.

Alternatively one can think of some ttest comparing "discretized" with A-L algorithm values from distributions for both regions to make the comparison more "statistically correct", eg:

 nd.rbc1 <- regionBasedCoverage(nd1, steps, max_mi)
 d1 <- distribs(nd.rbc1)
 nd.rbc2 <- regionBasedCoverage(nd2, steps, max_mi)
 d2 <- distribs(nd.rbc2)
t.test(d1, d2)

As a side note: irreducible region in the sense of Aumann and Lindell (2003) has to start and end with coverage > mi So if your region is not "filled on both sides" you may check a representative sub-region. And all the functions except addExperimentsToReadset() do not require the database connection.

Do not know if this is what you precisely asked for - just my approximate solution :)

Cheers,
Michal


On 25/10/2010 11:58, Michael Dondrup wrote:
Hi,

I need some statistical advise for the following problem. Given an RNA-seq 
experiment I would like to assess
statistical significance of differential read-counts>within<  a sample. Given a 
sample with read counts
for two (adjacent) regions out of all all regions of the genome I am interested 
in, say gene A and intron B.

I wish to detect if region B has a significantly lower read count than A, 
lengths of regions A and B are known to be different,
so I think fisher's-exact test does not apply here. Region length should be 
taken into account for this, as I think that
the more positions are different between regions, the more significant the 
result should be. I also have biol. replicates,
but these replicates have different numbers of reads.

Packages like DEseq and edgeR seem to be tailored to between samples 
comparison. So which method
would you recommend for within sample comparison?

Thank you very much
Michael

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to