Hi. On Mon, Apr 29, 2013 at 4:16 AM, <jonathan.crowth...@mail.dcu.ie> wrote: > Hi Henrik, > > Thank you for the reply. > This is quite helpful however from using the following code I am able to > produce a matrix of the intensities for each of the probes... (replicates > included) > > library(aroma.affymetrix) > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") > raw_sample <- AffymetrixCelSet$byName("Sample", cdf=cdf) > ciS<- getCellIndices(cdf, unlist=TRUE, useNames=TRUE) > s<- sample(ciS, 250) > head(ciS) > d<- extractMatrix(cs, cells=ciS, verbose=-50) > write.table(file="all_probes.txt",df, quote=FALSE, sep="\t", > row.names=FALSE) > > My issue now is that with this matrix I can see the 6+million probes however > the probe ID's are not present. Maybe I am missing something but If you > could help me associate each probe intensity value with the probe ID I would > be very grateful.
What do you mean by 'probe ID'; what you expect it do be? Note that in Affymetrix terms, there are 'unit' and 'group' (probeset) IDs/names, but probes don't really have IDs other that an (x,y) location or an index (as you use it above). However, you could pull out probe-specific CDF annotation from the CDF file as follows: library("aroma.affymetrix"); cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full"); # Some example units units <- c(1012:1013, 950123:950125); # Read CDF info as data.frame cdfData <- readDataFrame(cdf, units=units); # Will take a very long time if done for many units # Order by cell indices o <- order(cdfData$cell); cdfData <- cdfData[o,]; 'data.frame': 15 obs. of 16 variables: $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ unitType : chr "genotyping" "genotyping" "genot".. $ unitType : chr "genotyping" "genotyping" "genotyping" ... $ unitDirection : chr "sense" "sense" "sense" "sense" ... $ unitNbrOfAtoms : int 6 6 6 6 6 6 6 6 1 1 ... $ cell : int 539387 539388 902651 902652 18384.. $ x : int 706 707 2170 2171 2630 2631 998 9.. $ y : int 201 201 336 336 685 685 1112 1112.. $ groupNbrOfAtoms: int 3 3 3 3 3 3 3 3 1 1 ... $ cell : int 539387 539388 902651 902652 ... $ unit : int 1012 1012 1013 1013 1012 1012 101.. $ y : int 201 201 336 336 685 685 1112 1112 1195 ... $ unitName : chr "SNP_A-2001598" "SNP_A-2001598" ... $ unitType : chr "genotyping" "genotyping" "genot".. $ indexPos : int 2 1 6 5 4 3 4 3 1 1 ... $ cell : int 539387 539388 902651 902652 18384... # Note that for some chip types some probes (cells) occur in multiple # probe sets meaning you may have duplicates. I don't think this is # the case for SNP chips though. Sanity check... stopifnot(!anyDuplicated(cdfData$cell)); # Extract the corresponding probe signals from 'csR' (AffymetrixCelSet) Y <- extractMatrix(csR, cells=cdfData$cell); # Merge CDF annotation data with signals data <- cbind(cdfData, Y); # Save # [see help("writeDataFrame.data.frame")] pathname <- writeDataFrame(data, file="all_probes.txt", header=list(chipType=getChipType(cdf))); Again, not sure what you're going to use this for/where to import it; you may end up reinventing the wheel. Hope this helps /Henrik > > Thanks, > Jonathan > > On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote: >> >> Hi Jonathan. >> >> On Thu, Apr 11, 2013 at 6:38 AM, <jonathan....@mail.dcu.ie> wrote: >> > Hi all, >> > >> > I suppose this is a simple enough task even for a newbie like me, I have >> > found a similar related post but I have two questions: >> > >> > My First Question when I use the following commands in R: >> > >> > library(aroma.affymetrix) >> > >> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") >> > cs <- AffymetrixCelSet$byName("Arles", cdf=cdf) >> > >> > unit <- indexOf(cdf, "SNP_A-8656720") >> > y <- readUnits(cs, units=unit) >> > str(y) >> > >> > This allows me to gather the raw intensities for a SNP or CN probe. as >> > follows: >> > $`SNP_A-8656720`$A$intensities >> > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] >> > [,13] >> > [,14] >> > [1,] 786 807 1051 879 1971 1447 1826 236 1249 2335 1140 416 >> > 1147 >> > 2054 >> > [2,] 694 823 1027 835 1673 1167 1729 252 1068 2339 982 411 >> > 769 >> > 1786 >> > [3,] 752 665 913 820 1621 1356 1555 248 1344 2362 1417 339 >> > 991 >> > 1835 >> > >> > >> > $`SNP_A-8656720`$G >> > $`SNP_A-8656720`$G$intensities >> > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] >> > [,13] >> > [,14] >> > [1,] 273 1014 481 1012 402 383 421 1138 321 861 614 1859 >> > 687 >> > 549 >> > [2,] 222 528 476 825 602 719 460 912 417 796 650 1617 >> > 537 >> > 661 >> > [3,] 259 781 543 754 492 452 550 909 316 743 518 1847 >> > 529 >> > 651 >> > >> > From the previous post this data is supposed to reference to the A and B >> > allele and for the forward and reverse strands. My question is, what >> > refers >> > the Allele A/B ($A/$G) and forward / reverse, also by that logic there >> > should be 4 sets of data? >> >> Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer >> put down probes for both strands. Instead, they pick(ed) the strand, >> reverse *or* forward, that worked "the best" (the best genotyping >> performance). Try this: >> >> > counts <- nbrOfGroupsPerUnit(cdf); # Slow first time >> > table(counts); >> counts >> 1 2 4 >> 946447 906874 2748 >> >> As you see, there are only 2,748 SNPs with both strands, whereas the >> majority only have one. >> >> BTW, those "sets" of probes are formally referred to as "unit groups" >> (it's ok to also call them "probe sets"). To find out which strands >> the different unit groups are querying, you can do: >> >> > unit <- indexOf(cdf, "SNP_A-8656720"); >> > cdfList <- readUnits(cdf, units=unit); >> > str(cdfList) >> List of 1 >> $ SNP_A-8656720:List of 3 >> ..$ type : int 2 >> ..$ direction: int 2 >> ..$ groups :List of 2 >> .. ..$ A:List of 6 >> .. .. ..$ x : int [1:3] 1565 193 1629 >> .. .. ..$ y : int [1:3] 83 1466 2038 >> .. .. ..$ pbase : chr [1:3] "a" "a" "a" >> .. .. ..$ tbase : chr [1:3] "t" "t" "t" >> .. .. ..$ expos : int [1:3] 13 13 13 >> .. .. ..$ direction: int 2 >> .. ..$ G:List of 6 >> .. .. ..$ x : int [1:3] 1564 192 1628 >> .. .. ..$ y : int [1:3] 83 1466 2038 >> .. .. ..$ pbase : chr [1:3] "g" "g" "g" >> .. .. ..$ tbase : chr [1:3] "c" "c" "c" >> .. .. ..$ expos : int [1:3] 13 13 13 >> .. .. ..$ direction: int 2 >> >> See that 'direction'? It specifies the strand, cf. >> help("readCdfUnits", package="affxparser") [the function used >> internally by readUnits()]. >> >> >> > As for CN probes this is much more obvious as there is only one probe >> > interrogating the position. >> >> Correct. >> >> > >> > My Second Question is in relation to expanding this code in order to >> > generate the raw intensities of all the SNP_ probes and then to run the >> > Sam >> > e for CN_ probes so that I have the raw intensities for each seperately? >> >> > types <- getUnitTypes(cdf); # Slow first time >> > table(types); >> types >> 1 2 5 >> 621 909622 945826 >> >> Here type "2" refers to a "SNP" and "5" a "CN unit". For this chip >> type you could also infer which the SNPs/CN probes by their unit names >> ("SNP_", and "CN_"), but that assumes that the names follow that >> convention which is *not* true for other chip types. So, use >> getUnitTypes(): >> >> # SNPs >> > snpUnits <- which(types == 2); >> > str(snpUnits) >> int [1:909622] 622 623 624 625 626 627 628 ... >> >> # CN probes (aka non-polymorphic loci). >> > npUnits <- which(types == 5); >> > str(npUnits) >> int [1:945826] 910244 910245 910246 910247 ... >> >> Hope this helps >> >> /Henrik >> >> > >> > >> > Cheers, >> > Jonathan >> > >> > -- >> > -- >> > When reporting problems on aroma.affymetrix, make sure 1) to run the >> > latest >> > version of the package, 2) to report the output of sessionInfo() and >> > traceback(), and 3) to post a complete code example. >> > >> > >> > You received this message because you are subscribed to the Google >> > Groups >> > "aroma.affymetrix" group with website http://www.aroma-project.org/. >> > To post to this group, send email to aroma-af...@googlegroups.com >> > To unsubscribe and other options, go to >> > http://www.aroma-project.org/forum/ >> > >> > --- >> > You received this message because you are subscribed to the Google >> > Groups >> > "aroma.affymetrix" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to aroma-affymetr...@googlegroups.com. >> > For more options, visit https://groups.google.com/groups/opt_out. >> > >> > -- -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/ --- You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group. To unsubscribe from this group and stop receiving emails from it, send an email to aroma-affymetrix+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.