Hi Sean Thanks for your suggestion on both the mailing lists. I am now reading the coverage values from a file and storing them as a data.frame and then creating a new numeric vector for each lane. Each vector may have 15000-45000 entries. The values are integers with a significant difference in values, some could be between 0-1 eg (0.45,0.89) and then I also have values in range like (4000, 44000). I am just taking random examples to explain the bias in the data.
When I plot a histogram I just see one big bar. I feel the bins are not created effectively. I also tried couple of different options in the R hist function but with same result. hist(lane2, freq=TRUE, breaks=10); hist(lane2, freq=TRUE, include.lowest=TRUE); Any suggestions on how to bin ?? Thanks, -Abhi On Sun, Aug 16, 2009 at 7:45 AM, Sean Davis<[email protected]> wrote: > > > On Sun, Aug 16, 2009 at 4:20 AM, Abhishek Pratap <[email protected]> > wrote: >> >> Hi Michael >> >> Thanks for your reply. Well basically we have downloaded the human >> reference RNA set from NCBI and using the same to asses coverage. It >> is a rough estimate to help our collaborators decide on hwo much >> sequencing they need to do in order to reach required coverage for SNP >> calling. So till now I have calculated coverage using the ELAND >> alignment results. I am now looking for ways to plot it so that >> biologists could interpret it easily. >> >> So I have many hashes(perl), each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each >> hash/list may have couple of hundred to thousands entry "contig_name >> => coverage". What I want to do is to plot a histogram for each >> hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). > > Abhi, > > It sounds like you already have the data that you want to plot, but in > perl? If so, you can simply write out the numeric data to a file and then > read it into R. R has the hist() function which will do the binning, etc., > and the read.table() function to read in the data. > > If I am missing something, you will probably need to clarify what details > you need to still do to accomplish your task. > > Sean > >> >> On Thu, Aug 13, 2009 at 4:30 AM, Michael >> Dondrup<[email protected]> wrote: >> > Hi Abhi, >> > >> > just a short comment. To assess coverage the crucial point is to know >> > the >> > length of your target sequence, thus the length of the >> > human transcriptome. Then e.g. the Lander-Waterman statistic can be >> > computed. So how could the length of total mRNA >> > be calculated. I think this is not possible, is it? >> > >> > Best >> > Michael >> > >> > Am 12.08.2009 um 23:59 schrieb Abhishek Pratap: >> > >> >> Hi All >> >> >> >> Just wondering if a package/R function exists which can help us answer >> >> the following question. >> >> >> >> We are trying to assess the right amount of sequencing we need to do >> >> in order to cover the human transcriptome. For the runs we have >> >> already done, we have the reads aligned to human mrna ref using ELAND. >> >> We would like to plot graphs per lane to show the percent coverage of >> >> human transcriptome. >> >> >> >> Let me know if it is not clear, I can reframe or explain in detail. >> >> >> >> Thanks, >> >> -Abhi >> >> >> >> _______________________________________________ >> >> Bioc-sig-sequencing mailing list >> >> [email protected] >> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > >> > >> > >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
