Victor Ruotti wrote:

Hi Martin
I'm having trouble interpreting the Scatter Plot Matrix listed in the ShortRead pdf file.
Can you comment on this?

Hi Victor --

Not sure that the figure made it through the mailing list; it's on p. 13, just before section 2.2, of the 'Overview' vignette.

The figure was meant to be a 'teaser' to encourage people to explore and understand the way the data is collected. It shows the intensities recorded at an early stage in the eland pipeline, after image acquisition but before base calling. The axes are the intensities reported by Firecrest. Each panel represents a pairwise comparison between two bases. Each point in a panel represents a single cluster; all clusters are represented in each panel. The displayed figure is at cycle 2. The suggestion in the text is to compare this to a later cycle (e.g., 30); it's fun to make a little animation of this, e.g., by looping over cycles.

As I understand it, there are two (not four, as one might think) florescent nucleotides ('dATP', 'dCTP'), and each is measured on two different wavelengths ('red', 'green'). This is from here

http://dx.doi.org/10.1016/S0003-2697(03)00291-4

Roughly, there are four different intensities ('dATP, red', 'dATP, green', 'dCTP, red', 'dCTP, green'). The mapping between intensity and underlying base is orthogonal along one dimension (the panels comparing A or C with G or T) but not along the other (the panels comparing A and C, and G and T).

Ideally, one would like to see discrete groups of points, like in any discriminant analysis. Even early in the cycle we see that this is not clear 'by eye'. Points differ in amplitude (distance from origin, related to number of DNA strands per cluster?) and deviate from the horizontal, vertical, or diagonal ('phasing' or dye bleed, where each cluster has florescence in all four dimensions resulting from the some DNA strands being sequenced at the wrong location, or residual flourescent bases from earlier reactions, for instance).

A picture from a later cycle shows that points collapse toward the origin (lower intensity, e.g., because of depleted reagents or lost DNA strands) and become less orthogonal (strands within a cluster increasingly out of phase, dye build-up from previous reactions, etc).

Mostly the figure shows the challenges faced by base calling algorithms, and the challenges for the technology (making the clusters fall out more discretely, for more cycles).

This is my understanding of the chemistry involved; perhaps others will contribute more authoritatively.

Martin

THanks,
Victor


------------------------------------------------------------------------






--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to