Hi Henrik, That is exactly what I was looking for. I was missing the "Order by Cell Indices". Thank you.
Jonathan On Monday, April 29, 2013 11:17:57 PM UTC+2, Henrik Bengtsson wrote: > > Hi. > > On Mon, Apr 29, 2013 at 4:16 AM, <jonathan....@mail.dcu.ie <javascript:>> > wrote: > > Hi Henrik, > > > > Thank you for the reply. > > This is quite helpful however from using the following code I am able to > > produce a matrix of the intensities for each of the probes... > (replicates > > included) > > > > library(aroma.affymetrix) > > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") > > raw_sample <- AffymetrixCelSet$byName("Sample", cdf=cdf) > > ciS<- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) > > s<- sample(ciS, 250) > > head(ciS) > > d<- extractMatrix(cs, cells=ciS, verbose=-50) > > write.table(file="all_probes.txt",df, quote=FALSE, sep="\t", > > row.names=FALSE) > > > > My issue now is that with this matrix I can see the 6+million probes > however > > the probe ID's are not present. Maybe I am missing something but If you > > could help me associate each probe intensity value with the probe ID I > would > > be very grateful. > > What do you mean by 'probe ID'; what you expect it do be? Note that > in Affymetrix terms, there are 'unit' and 'group' (probeset) > IDs/names, but probes don't really have IDs other that an (x,y) > location or an index (as you use it above). > > However, you could pull out probe-specific CDF annotation from the CDF > file as follows: > > library("aroma.affymetrix"); > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); > # Some example units > units <- c(1012:1013, 950123:950125); > # Read CDF info as data.frame > cdfData <- readDataFrame(cdf, units=units); # Will take a very long > time if done for many units > # Order by cell indices > o <- order(cdfData$cell); > cdfData <- cdfData[o,]; > 'data.frame': 15 obs. of 16 variables: > $ unit : int 1012 1012 1013 1013 1012 1012 101.. > $ unitType : chr "genotyping" "genotyping" "genot".. > $ unitType : chr "genotyping" "genotyping" "genotyping" ... > $ unitDirection : chr "sense" "sense" "sense" "sense" ... > $ unitNbrOfAtoms : int 6 6 6 6 6 6 6 6 1 1 ... > $ cell : int 539387 539388 902651 902652 18384.. > $ x : int 706 707 2170 2171 2630 2631 998 9.. > $ y : int 201 201 336 336 685 685 1112 1112.. > $ groupNbrOfAtoms: int 3 3 3 3 3 3 3 3 1 1 ... > $ cell : int 539387 539388 902651 902652 ... > $ unit : int 1012 1012 1013 1013 1012 1012 101.. > $ y : int 201 201 336 336 685 685 1112 1112 1195 ... > $ unitName : chr "SNP_A-2001598" "SNP_A-2001598" ... > $ unitType : chr "genotyping" "genotyping" "genot".. > $ indexPos : int 2 1 6 5 4 3 4 3 1 1 ... > $ cell : int 539387 539388 902651 902652 18384... > > # Note that for some chip types some probes (cells) occur in multiple > # probe sets meaning you may have duplicates. I don't think this is > # the case for SNP chips though. Sanity check... > stopifnot(!anyDuplicated(cdfData$cell)); > > # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) > Y <- extractMatrix(csR, cells=cdfData$cell); > > # Merge CDF annotation data with signals > data <- cbind(cdfData, Y); > > # Save > # [see help("writeDataFrame.data.frame")] > pathname <- writeDataFrame(data, file="all_probes.txt", > header=list(chipType=getChipType(cdf))); > > > Again, not sure what you're going to use this for/where to import it; > you may end up reinventing the wheel. > > Hope this helps > > /Henrik > > > > > Thanks, > > Jonathan > > > > On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: > >> > >> Hi Jonathan. > >> > >> On Thu, Apr 11, 2013 at 6:38 AM, <jonathan....@mail.dcu.ie> wrote: > >> > Hi all, > >> > > >> > I suppose this is a simple enough task even for a newbie like me, I > have > >> > found a similar related post but I have two questions: > >> > > >> > My First Question when I use the following commands in R: > >> > > >> > library(aroma.affymetrix) > >> > > >> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") > >> > cs <- AffymetrixCelSet$byName("Arles", cdf=cdf) > >> > > >> > unit <- indexOf(cdf, "SNP_A-8656720") > >> > y <- readUnits(cs, units=unit) > >> > str(y) > >> > > >> > This allows me to gather the raw intensities for a SNP or CN probe. > as > >> > follows: > >> > $`SNP_A-8656720`$A$intensities > >> > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] > >> > [,13] > >> > [,14] > >> > [1,] 786 807 1051 879 1971 1447 1826 236 1249 2335 1140 416 > >> > 1147 > >> > 2054 > >> > [2,] 694 823 1027 835 1673 1167 1729 252 1068 2339 982 411 > >> > 769 > >> > 1786 > >> > [3,] 752 665 913 820 1621 1356 1555 248 1344 2362 1417 339 > >> > 991 > >> > 1835 > >> > > >> > > >> > $`SNP_A-8656720`$G > >> > $`SNP_A-8656720`$G$intensities > >> > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] > >> > [,13] > >> > [,14] > >> > [1,] 273 1014 481 1012 402 383 421 1138 321 861 614 1859 > >> > 687 > >> > 549 > >> > [2,] 222 528 476 825 602 719 460 912 417 796 650 1617 > >> > 537 > >> > 661 > >> > [3,] 259 781 543 754 492 452 550 909 316 743 518 1847 > >> > 529 > >> > 651 > >> > > >> > From the previous post this data is supposed to reference to the A > and B > >> > allele and for the forward and reverse strands. My question is, what > >> > refers > >> > the Allele A/B ($A/$G) and forward / reverse, also by that logic > there > >> > should be 4 sets of data? > >> > >> Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer > >> put down probes for both strands. Instead, they pick(ed) the strand, > >> reverse *or* forward, that worked "the best" (the best genotyping > >> performance). Try this: > >> > >> > counts <- nbrOfGroupsPerUnit(cdf); # Slow first time > >> > table(counts); > >> counts > >> 1 2 4 > >> 946447 906874 2748 > >> > >> As you see, there are only 2,748 SNPs with both strands, whereas the > >> majority only have one. > >> > >> BTW, those "sets" of probes are formally referred to as "unit groups" > >> (it's ok to also call them "probe sets"). To find out which strands > >> the different unit groups are querying, you can do: > >> > >> > unit <- indexOf(cdf, "SNP_A-8656720"); > >> > cdfList <- readUnits(cdf, units=unit); > >> > str(cdfList) > >> List of 1 > >> $ SNP_A-8656720:List of 3 > >> ..$ type : int 2 > >> ..$ direction: int 2 > >> ..$ groups :List of 2 > >> .. ..$ A:List of 6 > >> .. .. ..$ x : int [1:3] 1565 193 1629 > >> .. .. ..$ y : int [1:3] 83 1466 2038 > >> .. .. ..$ pbase : chr [1:3] "a" "a" "a" > >> .. .. ..$ tbase : chr [1:3] "t" "t" "t" > >> .. .. ..$ expos : int [1:3] 13 13 13 > >> .. .. ..$ direction: int 2 > >> .. ..$ G:List of 6 > >> .. .. ..$ x : int [1:3] 1564 192 1628 > >> .. .. ..$ y : int [1:3] 83 1466 2038 > >> .. .. ..$ pbase : chr [1:3] "g" "g" "g" > >> .. .. ..$ tbase : chr [1:3] "c" "c" "c" > >> .. .. ..$ expos : int [1:3] 13 13 13 > >> .. .. ..$ direction: int 2 > >> > >> See that 'direction'? It specifies the strand, cf. > >> help("readCdfUnits", package="affxparser") [the function used > >> internally by readUnits()]. > >> > >> > >> > As for CN probes this is much more obvious as there is only one probe > >> > interrogating the position. > >> > >> Correct. > >> > >> > > >> > My Second Question is in relation to expanding this code in order to > >> > generate the raw intensities of all the SNP_ probes and then to run > the > >> > Sam > >> > e for CN_ probes so that I have the raw intensities for each > seperately? > >> > >> > types <- getUnitTypes(cdf); # Slow first time > >> > table(types); > >> types > >> 1 2 5 > >> 621 909622 945826 > >> > >> Here type "2" refers to a "SNP" and "5" a "CN unit". For this chip > >> type you could also infer which the SNPs/CN probes by their unit names > >> ("SNP_", and "CN_"), but that assumes that the names follow that > >> convention which is *not* true for other chip types. So, use > >> getUnitTypes(): > >> > >> # SNPs > >> > snpUnits <- which(types == 2); > >> > str(snpUnits) > >> int [1:909622] 622 623 624 625 626 627 628 ... > >> > >> # CN probes (aka non-polymorphic loci). > >> > npUnits <- which(types == 5); > >> > str(npUnits) > >> int [1:945826] 910244 910245 910246 910247 ... > >> > >> Hope this helps > >> > >> /Henrik > >> > >> > > >> > > >> > Cheers, > >> > Jonathan > >> > > >> > -- > >> > -- > >> > When reporting problems on aroma.affymetrix, make sure 1) to run the > >> > latest > >> > version of the package, 2) to report the output of sessionInfo() and > >> > traceback(), and 3) to post a complete code example. > >> > > >> > > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. > >> > To post to this group, send email to aroma-af...@googlegroups.com > >> > To unsubscribe and other options, go to > >> > http://www.aroma-project.org/forum/ > >> > > >> > --- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "aroma.affymetrix" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send > >> > an > >> > email to aroma-affymetr...@googlegroups.com. > >> > For more options, visit https://groups.google.com/groups/opt_out. > >> > > >> > > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.