Re: [aroma.affymetrix] Why are there duplicate positions for chromosome 14 on the Affy 6.0 array?

Henrik Bengtsson Fri, 24 Sep 2010 13:13:18 -0700

Hi,

I assume you are referring to the following UGP annotation file:


> ugp
AromaUgpFile:
Name: GenomeWideSNP_6
Tags: Full,na30,hg18,HB20100215
Full name: GenomeWideSNP_6,Full,na30,hg18,HB20100215
Pathname: ../../../Documents/My Data/annotationData/chipTypes/GenomeWideSNP_6/Ge
nomeWideSNP_6,Full,na30,hg18,HB20100215.ugp
File size: 8.97 MB (9407867 bytes)
RAM: 0.00 MB
Number of data rows: 1881415
File format: v1
Dimensions: 1881415x2
Column classes: integer, integer
Number of bytes per column: 1, 4
Footer: <createdOn>20100215 19:24:13 CET</createdOn><platform>Affymetrix</platfo
rm><chipType>GenomeWideSNP_6,Full</chipType><createdBy><fullname>Henrik Bengtsso
n</fullname><email>h...@aroma-project.org</email></createdBy><srcFiles><srcFile1><
filename>GenomeWideSNP_6,Full.cdf</filename><filesize>493291745</filesize><check
sum>3fbe0f6e7c8a346105238a3f3d10d4ec</checksum></srcFile1><srcFile2><filename>Ge
nomeWideSNP_6.na30.annot.csv</filename><filesize>1418300755</filesize><checksum>
892892065e8b27f83874bafa58f64403</checksum></srcFile2><srcFile3><filename>Genome
WideSNP_6.cn.na30.annot.csv</filename><filesize>814504485</filesize><checksum>17
6369a81250f46aed90a3e9a5c968d5</checksum></srcFile3></srcFiles>
Chip type: GenomeWideSNP_6,Full
Platform: Affymetrix

As you see from the file footer, more clear if we do:

> str(readFooter(ugp));
List of 5
 $ createdOn: chr "20100215 19:24:13 CET"
 $ platform : chr "Affymetrix"
 $ chipType : chr "GenomeWideSNP_6,Full"
 $ createdBy:List of 2
  ..$ fullname: chr "Henrik Bengtsson"
  ..$ email   : chr "h...@aroma-project.org"
 $ srcFiles :List of 3
  ..$ srcFile1:List of 3
  .. ..$ filename: chr "GenomeWideSNP_6,Full.cdf"
  .. ..$ filesize: chr "493291745"
  .. ..$ checksum: chr "3fbe0f6e7c8a346105238a3f3d10d4ec"
  ..$ srcFile2:List of 3
  .. ..$ filename: chr "GenomeWideSNP_6.na30.annot.csv"
  .. ..$ filesize: chr "1418300755"
  .. ..$ checksum: chr "892892065e8b27f83874bafa58f64403"
  ..$ srcFile3:List of 3
  .. ..$ filename: chr "GenomeWideSNP_6.cn.na30.annot.csv"
  .. ..$ filesize: chr "814504485"
  .. ..$ checksum: chr "176369a81250f46aed90a3e9a5c968d5"

This UGP file contains data imported from the two Affymetrix NetAffx
('na' version '30') files 'GenomeWideSNP_6.na30.annot.csv' and
'GenomeWideSNP_6.cn.na30.annot.csv'.

I am rather sure that there is no bug in the import code that causes
duplicated genomic locations, so I think you need to seek your answer
over at Affymetrix.  It is not unlikely that Affymetrix did design the
chip type to have probes with unique locations but as the Human genome
annotation got updated, some of the probes turned out to get the same
location.  That could be one reason.  You can sign up at Affymetrix
and then you may query their NetAffx database online and narrow this
down:

  http://www.affymetrix.com/analysis/index.affx

If you'd like to pull out probe sequence, you can access them via the
ACS file.  Say you've located two units with the same genomic
location:

unitNames <- c("SNP_A-4229333", "CN_655406");
units <- indexOf(cdf, names=unitNames);
cells <- getCellIndices(cdf, units=units);

List of 2
 $ SNP_A-4229333:List of 1
  ..$ groups:List of 2
  .. ..$ C:List of 1
  .. .. ..$ indices: int [1:3] 5815522 244324 283672
  .. ..$ T:List of 1
  .. .. ..$ indices: int [1:3] 5815521 244323 283671
 $ CN_655406    :List of 1
  ..$ groups:List of 1
  .. ..$ CN_655406:List of 1
  .. .. ..$ indices: int 5959075

> cellsA <- unlist(cells[[1]], use.names=FALSE);
> cellsB <- unlist(cells[[2]], use.names=FALSE);
> cellsA
[1] 5815522  244324  283672 5815521  244323  283671
> cellsB
[1] 5959075

# Look only at one pair of probes
> cells <- c(cellsA[1], cellsB[1]);
> cells
[1] 5815522 5959075

# Read probe sequences and strandiness
acs <- AromaCellSequenceFile$byChipType("GenomeWideSNP_6");
strands <- readTargetStrands(acs, cells=cells);
> strands
[1] "+" "-"          # <= Different strands
attr(,"map")
<NA>    +    -
00 01 02

> seqs <- readSequences(acs, cells=cells);
> cat(seqs, sep="\n")
TATCGAGGTTTGTAGCTTCCTTGCA
TGCAAGGAAGCTACAAACCTCGATA

# Reverse one negative strand (poormans version)
seqs[2] <- paste(rev(strsplit(seqs[2], split="")[[1]]), collapse="")

> cat(seqs, sep="\n")
TATCGAGGTTTGTAGCTTCCTTGCA
ATAGCTCCAAACATCGAAGGAACGT

So as you see, they are perfect complimentary matches, meaning the
probe from one of the unit match one strand and the one from the other
unit the same piece of DNA but on the reverse strand.

I'll let you take it from there.

Hope this helps

/Henrik






On Fri, Sep 24, 2010 at 12:28 PM, Matt <matt.kowg...@gmail.com> wrote:
> Dear Henrik,
>
> In analyzing some copy number data from chromosome 14, I noticed that
> there are 57,103 loci for the Affy 6.0 array, however, only 56,980 of
> these loci have unique positions. Here is the proof from R
>
>> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full")
>> gi <- getGenomeInformation(cdf)
>> units <- getUnitsOnChromosome(gi, chromosome=14)
>> pos <- getPositions(gi, units=units)
>> length(pos)
> [1] 57103
>> length(unique(pos))
> [1] 56980
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
>  [1] aroma.affymetrix_1.7.0 aroma.apd_0.1.7
> affxparser_1.20.0
>  [4] R.huge_0.2.0           aroma.core_1.7.0
> aroma.light_1.16.1
>  [7] matrixStats_0.2.1      R.rsp_0.3.6
> R.cache_0.3.0
> [10] R.filesets_0.8.3       digest_0.4.2
> R.utils_1.5.0
> [13] R.oo_1.7.3             R.methodsS3_1.2.0
>
>
> Shouldn't all the loci have different positions in bps? Note, the
> duplicate positions seem to correspond to different raw CN
> estiamtes.This is a problem for me because my hidden Markov model
> doesnt work when I have multiple observations with the same positions.
>
> Thanks for you assistance.
>
> Matt
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] Why are there duplicate positions for chromosome 14 on the Affy 6.0 array?

Reply via email to