Re: [aroma.affymetrix] SNP_6 raw intensity Extraction

jonathan . crowther2 Tue, 30 Apr 2013 00:00:50 -0700

Hi Henrik,

That is exactly what I was looking for. I was missing the "Order by Cell 
Indices".
Thank you.


Jonathan

On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote:
>
> Hi. 
>
> On Mon, Apr 29, 2013 at 4:16 AM,  <jonathan....@mail.dcu.ie <javascript:>> 
> wrote: 
> > Hi Henrik, 
> > 
> > Thank you for the reply. 
> > This is quite helpful however from using the following code I am able to 
> > produce a matrix of the intensities for each of the probes... 
> (replicates 
> > included) 
> > 
> > library(aroma.affymetrix) 
> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") 
> > raw_sample <- AffymetrixCelSet$byName("Sample", cdf=cdf) 
> > ciS<- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) 
> > s<- sample(ciS, 250) 
> > head(ciS) 
> > d<- extractMatrix(cs, cells=ciS, verbose=-50) 
> > write.table(file="all_probes.txt",df, quote=FALSE, sep="\t", 
> > row.names=FALSE) 
> > 
> > My issue now is that with this matrix I can see the 6+million probes 
> however 
> > the probe ID's are not present. Maybe I am missing something but If you 
> > could help me associate each probe intensity value with the probe ID I 
> would 
> > be very grateful. 
>
> What do you mean by 'probe ID'; what you expect it do be?  Note that 
> in Affymetrix terms, there are 'unit' and 'group' (probeset) 
> IDs/names, but probes don't really have IDs other that an (x,y) 
> location or an index (as you use it above). 
>
> However, you could pull out probe-specific CDF annotation from the CDF 
> file as follows: 
>
> library("aroma.affymetrix"); 
> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); 
> # Some example units 
> units <- c(1012:1013, 950123:950125); 
> # Read CDF info as data.frame 
> cdfData <- readDataFrame(cdf, units=units);  # Will take a very long 
> time if done for many units 
> # Order by cell indices 
> o <- order(cdfData$cell); 
> cdfData <- cdfData[o,]; 
> 'data.frame':   15 obs. of  16 variables: 
>  $ unit           : int  1012 1012 1013 1013 1012 1012 101.. 
>  $ unitType       : chr  "genotyping" "genotyping" "genot".. 
>  $ unitType       : chr  "genotyping" "genotyping" "genotyping" ... 
>  $ unitDirection  : chr  "sense" "sense" "sense" "sense" ... 
>  $ unitNbrOfAtoms : int  6 6 6 6 6 6 6 6 1 1 ... 
>  $ cell           : int  539387 539388 902651 902652 18384.. 
>  $ x              : int  706 707 2170 2171 2630 2631 998 9.. 
>  $ y              : int  201 201 336 336 685 685 1112 1112.. 
>  $ groupNbrOfAtoms: int  3 3 3 3 3 3 3 3 1 1 ... 
>  $ cell           : int  539387 539388 902651 902652 ... 
>  $ unit           : int  1012 1012 1013 1013 1012 1012 101.. 
>  $ y              : int  201 201 336 336 685 685 1112 1112 1195 ... 
>  $ unitName       : chr  "SNP_A-2001598" "SNP_A-2001598" ... 
>  $ unitType       : chr  "genotyping" "genotyping" "genot".. 
>  $ indexPos       : int  2 1 6 5 4 3 4 3 1 1 ... 
>  $ cell           : int  539387 539388 902651 902652 18384... 
>
> # Note that for some chip types some probes (cells) occur in multiple 
> # probe sets meaning you may have duplicates.  I don't think this is 
> # the case for SNP chips though.  Sanity check... 
> stopifnot(!anyDuplicated(cdfData$cell)); 
>
> # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) 
> Y <- extractMatrix(csR, cells=cdfData$cell); 
>
> # Merge CDF annotation data with signals 
> data <- cbind(cdfData, Y); 
>
> # Save 
> # [see help("writeDataFrame.data.frame")] 
> pathname <- writeDataFrame(data, file="all_probes.txt", 
> header=list(chipType=getChipType(cdf))); 
>
>
> Again, not sure what you're going to use this for/where to import it; 
> you may end up reinventing the wheel. 
>
> Hope this helps 
>
> /Henrik 
>
> > 
> > Thanks, 
> > Jonathan 
> > 
> > On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: 
> >> 
> >> Hi Jonathan. 
> >> 
> >> On Thu, Apr 11, 2013 at 6:38 AM,  <jonathan....@mail.dcu.ie> wrote: 
> >> > Hi all, 
> >> > 
> >> > I suppose this is a simple enough task even for a newbie like me, I 
> have 
> >> > found a similar related post but I have two questions: 
> >> > 
> >> > My First Question when I use the following commands in R: 
> >> > 
> >> > library(aroma.affymetrix) 
> >> > 
> >> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") 
> >> > cs <- AffymetrixCelSet$byName("Arles", cdf=cdf) 
> >> > 
> >> > unit <- indexOf(cdf, "SNP_A-8656720") 
> >> > y <- readUnits(cs, units=unit) 
> >> > str(y) 
> >> > 
> >> > This allows me to gather the raw intensities for a SNP or CN probe. 
> as 
> >> > follows: 
> >> > $`SNP_A-8656720`$A$intensities 
> >> >      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 
> >> > [,13] 
> >> > [,14] 
> >> > [1,]  786  807 1051  879 1971 1447 1826  236 1249  2335  1140   416 
> >> > 1147 
> >> > 2054 
> >> > [2,]  694  823 1027  835 1673 1167 1729  252 1068  2339   982   411 
> >> > 769 
> >> > 1786 
> >> > [3,]  752  665  913  820 1621 1356 1555  248 1344  2362  1417   339 
> >> > 991 
> >> > 1835 
> >> > 
> >> > 
> >> > $`SNP_A-8656720`$G 
> >> > $`SNP_A-8656720`$G$intensities 
> >> >      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] 
> >> > [,13] 
> >> > [,14] 
> >> > [1,]  273 1014  481 1012  402  383  421 1138  321   861   614  1859 
> >> > 687 
> >> > 549 
> >> > [2,]  222  528  476  825  602  719  460  912  417   796   650  1617 
> >> > 537 
> >> > 661 
> >> > [3,]  259  781  543  754  492  452  550  909  316   743   518  1847 
> >> > 529 
> >> > 651 
> >> > 
> >> > From the previous post this data is supposed to reference to the A 
> and B 
> >> > allele and for the forward and reverse strands. My question is, what 
> >> > refers 
> >> > the Allele A/B ($A/$G) and forward / reverse, also by that logic 
> there 
> >> > should be 4 sets of data? 
> >> 
> >> Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer 
> >> put down probes for both strands.  Instead, they pick(ed) the strand, 
> >> reverse *or* forward, that worked "the best" (the best genotyping 
> >> performance).  Try this: 
> >> 
> >> > counts <- nbrOfGroupsPerUnit(cdf);  # Slow first time 
> >> > table(counts); 
> >> counts 
> >>      1      2      4 
> >> 946447 906874   2748 
> >> 
> >> As you see, there are only 2,748 SNPs with both strands, whereas the 
> >> majority only have one. 
> >> 
> >> BTW, those "sets" of probes are formally referred to as "unit groups" 
> >> (it's ok to also call them "probe sets").  To find out which strands 
> >> the different unit groups are querying, you can do: 
> >> 
> >> > unit <- indexOf(cdf, "SNP_A-8656720"); 
> >> > cdfList <- readUnits(cdf, units=unit); 
> >> > str(cdfList) 
> >> List of 1 
> >>  $ SNP_A-8656720:List of 3 
> >>   ..$ type     : int 2 
> >>   ..$ direction: int 2 
> >>   ..$ groups   :List of 2 
> >>   .. ..$ A:List of 6 
> >>   .. .. ..$ x        : int [1:3] 1565 193 1629 
> >>   .. .. ..$ y        : int [1:3] 83 1466 2038 
> >>   .. .. ..$ pbase    : chr [1:3] "a" "a" "a" 
> >>   .. .. ..$ tbase    : chr [1:3] "t" "t" "t" 
> >>   .. .. ..$ expos    : int [1:3] 13 13 13 
> >>   .. .. ..$ direction: int 2 
> >>   .. ..$ G:List of 6 
> >>   .. .. ..$ x        : int [1:3] 1564 192 1628 
> >>   .. .. ..$ y        : int [1:3] 83 1466 2038 
> >>   .. .. ..$ pbase    : chr [1:3] "g" "g" "g" 
> >>   .. .. ..$ tbase    : chr [1:3] "c" "c" "c" 
> >>   .. .. ..$ expos    : int [1:3] 13 13 13 
> >>   .. .. ..$ direction: int 2 
> >> 
> >> See that 'direction'?  It specifies the strand, cf. 
> >> help("readCdfUnits", package="affxparser") [the function used 
> >> internally by readUnits()]. 
> >> 
> >> 
> >> > As for CN probes this is much more obvious as there is only one probe 
> >> > interrogating the position. 
> >> 
> >> Correct. 
> >> 
> >> > 
> >> > My Second Question is in relation to expanding this code in order to 
> >> > generate the raw intensities of all the SNP_ probes and then to run 
> the 
> >> > Sam 
> >> > e for CN_ probes so that I have the raw intensities for each 
> seperately? 
> >> 
> >> > types <- getUnitTypes(cdf);  # Slow first time 
> >> > table(types); 
> >> types 
> >>      1      2      5 
> >>    621 909622 945826 
> >> 
> >> Here type "2" refers to a "SNP" and "5" a "CN unit".   For this chip 
> >> type you could also infer which the SNPs/CN probes by their unit names 
> >> ("SNP_", and "CN_"), but that assumes that the names follow that 
> >> convention which is *not* true for other chip types.  So, use 
> >> getUnitTypes(): 
> >> 
> >> # SNPs 
> >> > snpUnits <- which(types == 2); 
> >> > str(snpUnits) 
> >>  int [1:909622] 622 623 624 625 626 627 628 ... 
> >> 
> >> # CN probes (aka non-polymorphic loci). 
> >> > npUnits <- which(types == 5); 
> >> > str(npUnits) 
> >>  int [1:945826] 910244 910245 910246 910247 ... 
> >> 
> >> Hope this helps 
> >> 
> >> /Henrik 
> >> 
> >> > 
> >> > 
> >> > Cheers, 
> >> > Jonathan 
> >> > 
> >> > -- 
> >> > -- 
> >> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
> >> > latest 
> >> > version of the package, 2) to report the output of sessionInfo() and 
> >> > traceback(), and 3) to post a complete code example. 
> >> > 
> >> > 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. 
> >> > To post to this group, send email to aroma-af...@googlegroups.com 
> >> > To unsubscribe and other options, go to 
> >> > http://www.aroma-project.org/forum/ 
> >> > 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "aroma.affymetrix" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to aroma-affymetr...@googlegroups.com. 
> >> > For more options, visit https://groups.google.com/groups/opt_out. 
> >> > 
> >> > 
>

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [aroma.affymetrix] SNP_6 raw intensity Extraction

Reply via email to