Hi.

On Mon, Apr 29, 2013 at 4:16 AM,  <jonathan.crowth...@mail.dcu.ie> wrote:
> Hi Henrik,
>
> Thank you for the reply.
> This is quite helpful however from using the following code I am able to
> produce a matrix of the intensities for each of the probes... (replicates
> included)
>
> library(aroma.affymetrix)
> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full")
> raw_sample <- AffymetrixCelSet$byName("Sample", cdf=cdf)
> ciS<- getCellIndices(cdf, unlist=TRUE, useNames=TRUE)
> s<- sample(ciS, 250)
> head(ciS)
> d<- extractMatrix(cs, cells=ciS, verbose=-50)
> write.table(file="all_probes.txt",df, quote=FALSE, sep="\t",
> row.names=FALSE)
>
> My issue now is that with this matrix I can see the 6+million probes however
> the probe ID's are not present. Maybe I am missing something but If you
> could help me associate each probe intensity value with the probe ID I would
> be very grateful.

What do you mean by 'probe ID'; what you expect it do be?  Note that
in Affymetrix terms, there are 'unit' and 'group' (probeset)
IDs/names, but probes don't really have IDs other that an (x,y)
location or an index (as you use it above).

However, you could pull out probe-specific CDF annotation from the CDF
file as follows:

library("aroma.affymetrix");
cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full");
# Some example units
units <- c(1012:1013, 950123:950125);
# Read CDF info as data.frame
cdfData <- readDataFrame(cdf, units=units);  # Will take a very long
time if done for many units
# Order by cell indices
o <- order(cdfData$cell);
cdfData <- cdfData[o,];
'data.frame':   15 obs. of  16 variables:
 $ unit           : int  1012 1012 1013 1013 1012 1012 101..
 $ unitType       : chr  "genotyping" "genotyping" "genot"..
 $ unitType       : chr  "genotyping" "genotyping" "genotyping" ...
 $ unitDirection  : chr  "sense" "sense" "sense" "sense" ...
 $ unitNbrOfAtoms : int  6 6 6 6 6 6 6 6 1 1 ...
 $ cell           : int  539387 539388 902651 902652 18384..
 $ x              : int  706 707 2170 2171 2630 2631 998 9..
 $ y              : int  201 201 336 336 685 685 1112 1112..
 $ groupNbrOfAtoms: int  3 3 3 3 3 3 3 3 1 1 ...
 $ cell           : int  539387 539388 902651 902652 ...
 $ unit           : int  1012 1012 1013 1013 1012 1012 101..
 $ y              : int  201 201 336 336 685 685 1112 1112 1195 ...
 $ unitName       : chr  "SNP_A-2001598" "SNP_A-2001598" ...
 $ unitType       : chr  "genotyping" "genotyping" "genot"..
 $ indexPos       : int  2 1 6 5 4 3 4 3 1 1 ...
 $ cell           : int  539387 539388 902651 902652 18384...

# Note that for some chip types some probes (cells) occur in multiple
# probe sets meaning you may have duplicates.  I don't think this is
# the case for SNP chips though.  Sanity check...
stopifnot(!anyDuplicated(cdfData$cell));

# Extract the corresponding probe signals from 'csR' (AffymetrixCelSet)
Y <- extractMatrix(csR, cells=cdfData$cell);

# Merge CDF annotation data with signals
data <- cbind(cdfData, Y);

# Save
# [see help("writeDataFrame.data.frame")]
pathname <- writeDataFrame(data, file="all_probes.txt",
header=list(chipType=getChipType(cdf)));


Again, not sure what you're going to use this for/where to import it;
you may end up reinventing the wheel.

Hope this helps

/Henrik

>
> Thanks,
> Jonathan
>
> On Tuesday, April 23, 2013 2:40:56 AM UTC+2, Henrik Bengtsson wrote:
>>
>> Hi Jonathan.
>>
>> On Thu, Apr 11, 2013 at 6:38 AM,  <jonathan....@mail.dcu.ie> wrote:
>> > Hi all,
>> >
>> > I suppose this is a simple enough task even for a newbie like me, I have
>> > found a similar related post but I have two questions:
>> >
>> > My First Question when I use the following commands in R:
>> >
>> > library(aroma.affymetrix)
>> >
>> > cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full")
>> > cs <- AffymetrixCelSet$byName("Arles", cdf=cdf)
>> >
>> > unit <- indexOf(cdf, "SNP_A-8656720")
>> > y <- readUnits(cs, units=unit)
>> > str(y)
>> >
>> > This allows me to gather the raw intensities for a SNP or CN probe. as
>> > follows:
>> > $`SNP_A-8656720`$A$intensities
>> >      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
>> > [,13]
>> > [,14]
>> > [1,]  786  807 1051  879 1971 1447 1826  236 1249  2335  1140   416
>> > 1147
>> > 2054
>> > [2,]  694  823 1027  835 1673 1167 1729  252 1068  2339   982   411
>> > 769
>> > 1786
>> > [3,]  752  665  913  820 1621 1356 1555  248 1344  2362  1417   339
>> > 991
>> > 1835
>> >
>> >
>> > $`SNP_A-8656720`$G
>> > $`SNP_A-8656720`$G$intensities
>> >      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
>> > [,13]
>> > [,14]
>> > [1,]  273 1014  481 1012  402  383  421 1138  321   861   614  1859
>> > 687
>> > 549
>> > [2,]  222  528  476  825  602  719  460  912  417   796   650  1617
>> > 537
>> > 661
>> > [3,]  259  781  543  754  492  452  550  909  316   743   518  1847
>> > 529
>> > 651
>> >
>> > From the previous post this data is supposed to reference to the A and B
>> > allele and for the forward and reverse strands. My question is, what
>> > refers
>> > the Allele A/B ($A/$G) and forward / reverse, also by that logic there
>> > should be 4 sets of data?
>>
>> Starting with SNP chip GenomeWideSNP_5 (sic!), Affymetrix no longer
>> put down probes for both strands.  Instead, they pick(ed) the strand,
>> reverse *or* forward, that worked "the best" (the best genotyping
>> performance).  Try this:
>>
>> > counts <- nbrOfGroupsPerUnit(cdf);  # Slow first time
>> > table(counts);
>> counts
>>      1      2      4
>> 946447 906874   2748
>>
>> As you see, there are only 2,748 SNPs with both strands, whereas the
>> majority only have one.
>>
>> BTW, those "sets" of probes are formally referred to as "unit groups"
>> (it's ok to also call them "probe sets").  To find out which strands
>> the different unit groups are querying, you can do:
>>
>> > unit <- indexOf(cdf, "SNP_A-8656720");
>> > cdfList <- readUnits(cdf, units=unit);
>> > str(cdfList)
>> List of 1
>>  $ SNP_A-8656720:List of 3
>>   ..$ type     : int 2
>>   ..$ direction: int 2
>>   ..$ groups   :List of 2
>>   .. ..$ A:List of 6
>>   .. .. ..$ x        : int [1:3] 1565 193 1629
>>   .. .. ..$ y        : int [1:3] 83 1466 2038
>>   .. .. ..$ pbase    : chr [1:3] "a" "a" "a"
>>   .. .. ..$ tbase    : chr [1:3] "t" "t" "t"
>>   .. .. ..$ expos    : int [1:3] 13 13 13
>>   .. .. ..$ direction: int 2
>>   .. ..$ G:List of 6
>>   .. .. ..$ x        : int [1:3] 1564 192 1628
>>   .. .. ..$ y        : int [1:3] 83 1466 2038
>>   .. .. ..$ pbase    : chr [1:3] "g" "g" "g"
>>   .. .. ..$ tbase    : chr [1:3] "c" "c" "c"
>>   .. .. ..$ expos    : int [1:3] 13 13 13
>>   .. .. ..$ direction: int 2
>>
>> See that 'direction'?  It specifies the strand, cf.
>> help("readCdfUnits", package="affxparser") [the function used
>> internally by readUnits()].
>>
>>
>> > As for CN probes this is much more obvious as there is only one probe
>> > interrogating the position.
>>
>> Correct.
>>
>> >
>> > My Second Question is in relation to expanding this code in order to
>> > generate the raw intensities of all the SNP_ probes and then to run the
>> > Sam
>> > e for CN_ probes so that I have the raw intensities for each seperately?
>>
>> > types <- getUnitTypes(cdf);  # Slow first time
>> > table(types);
>> types
>>      1      2      5
>>    621 909622 945826
>>
>> Here type "2" refers to a "SNP" and "5" a "CN unit".   For this chip
>> type you could also infer which the SNPs/CN probes by their unit names
>> ("SNP_", and "CN_"), but that assumes that the names follow that
>> convention which is *not* true for other chip types.  So, use
>> getUnitTypes():
>>
>> # SNPs
>> > snpUnits <- which(types == 2);
>> > str(snpUnits)
>>  int [1:909622] 622 623 624 625 626 627 628 ...
>>
>> # CN probes (aka non-polymorphic loci).
>> > npUnits <- which(types == 5);
>> > str(npUnits)
>>  int [1:945826] 910244 910245 910246 910247 ...
>>
>> Hope this helps
>>
>> /Henrik
>>
>> >
>> >
>> > Cheers,
>> > Jonathan
>> >
>> > --
>> > --
>> > When reporting problems on aroma.affymetrix, make sure 1) to run the
>> > latest
>> > version of the package, 2) to report the output of sessionInfo() and
>> > traceback(), and 3) to post a complete code example.
>> >
>> >
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> > To post to this group, send email to aroma-af...@googlegroups.com
>> > To unsubscribe and other options, go to
>> > http://www.aroma-project.org/forum/
>> >
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "aroma.affymetrix" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to aroma-affymetr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >

-- 
-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

--- 
You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to aroma-affymetrix+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to