Hi, I assume you are referring to the following UGP annotation file:
> ugp AromaUgpFile: Name: GenomeWideSNP_6 Tags: Full,na30,hg18,HB20100215 Full name: GenomeWideSNP_6,Full,na30,hg18,HB20100215 Pathname: ../../../Documents/My Data/annotationData/chipTypes/GenomeWideSNP_6/Ge nomeWideSNP_6,Full,na30,hg18,HB20100215.ugp File size: 8.97 MB (9407867 bytes) RAM: 0.00 MB Number of data rows: 1881415 File format: v1 Dimensions: 1881415x2 Column classes: integer, integer Number of bytes per column: 1, 4 Footer: <createdOn>20100215 19:24:13 CET</createdOn><platform>Affymetrix</platfo rm><chipType>GenomeWideSNP_6,Full</chipType><createdBy><fullname>Henrik Bengtsso n</fullname><email>h...@aroma-project.org</email></createdBy><srcFiles><srcFile1>< filename>GenomeWideSNP_6,Full.cdf</filename><filesize>493291745</filesize><check sum>3fbe0f6e7c8a346105238a3f3d10d4ec</checksum></srcFile1><srcFile2><filename>Ge nomeWideSNP_6.na30.annot.csv</filename><filesize>1418300755</filesize><checksum> 892892065e8b27f83874bafa58f64403</checksum></srcFile2><srcFile3><filename>Genome WideSNP_6.cn.na30.annot.csv</filename><filesize>814504485</filesize><checksum>17 6369a81250f46aed90a3e9a5c968d5</checksum></srcFile3></srcFiles> Chip type: GenomeWideSNP_6,Full Platform: Affymetrix As you see from the file footer, more clear if we do: > str(readFooter(ugp)); List of 5 $ createdOn: chr "20100215 19:24:13 CET" $ platform : chr "Affymetrix" $ chipType : chr "GenomeWideSNP_6,Full" $ createdBy:List of 2 ..$ fullname: chr "Henrik Bengtsson" ..$ email : chr "h...@aroma-project.org" $ srcFiles :List of 3 ..$ srcFile1:List of 3 .. ..$ filename: chr "GenomeWideSNP_6,Full.cdf" .. ..$ filesize: chr "493291745" .. ..$ checksum: chr "3fbe0f6e7c8a346105238a3f3d10d4ec" ..$ srcFile2:List of 3 .. ..$ filename: chr "GenomeWideSNP_6.na30.annot.csv" .. ..$ filesize: chr "1418300755" .. ..$ checksum: chr "892892065e8b27f83874bafa58f64403" ..$ srcFile3:List of 3 .. ..$ filename: chr "GenomeWideSNP_6.cn.na30.annot.csv" .. ..$ filesize: chr "814504485" .. ..$ checksum: chr "176369a81250f46aed90a3e9a5c968d5" This UGP file contains data imported from the two Affymetrix NetAffx ('na' version '30') files 'GenomeWideSNP_6.na30.annot.csv' and 'GenomeWideSNP_6.cn.na30.annot.csv'. I am rather sure that there is no bug in the import code that causes duplicated genomic locations, so I think you need to seek your answer over at Affymetrix. It is not unlikely that Affymetrix did design the chip type to have probes with unique locations but as the Human genome annotation got updated, some of the probes turned out to get the same location. That could be one reason. You can sign up at Affymetrix and then you may query their NetAffx database online and narrow this down: http://www.affymetrix.com/analysis/index.affx If you'd like to pull out probe sequence, you can access them via the ACS file. Say you've located two units with the same genomic location: unitNames <- c("SNP_A-4229333", "CN_655406"); units <- indexOf(cdf, names=unitNames); cells <- getCellIndices(cdf, units=units); List of 2 $ SNP_A-4229333:List of 1 ..$ groups:List of 2 .. ..$ C:List of 1 .. .. ..$ indices: int [1:3] 5815522 244324 283672 .. ..$ T:List of 1 .. .. ..$ indices: int [1:3] 5815521 244323 283671 $ CN_655406 :List of 1 ..$ groups:List of 1 .. ..$ CN_655406:List of 1 .. .. ..$ indices: int 5959075 > cellsA <- unlist(cells[[1]], use.names=FALSE); > cellsB <- unlist(cells[[2]], use.names=FALSE); > cellsA [1] 5815522 244324 283672 5815521 244323 283671 > cellsB [1] 5959075 # Look only at one pair of probes > cells <- c(cellsA[1], cellsB[1]); > cells [1] 5815522 5959075 # Read probe sequences and strandiness acs <- AromaCellSequenceFile$byChipType("GenomeWideSNP_6"); strands <- readTargetStrands(acs, cells=cells); > strands [1] "+" "-" # <= Different strands attr(,"map") <NA> + - 00 01 02 > seqs <- readSequences(acs, cells=cells); > cat(seqs, sep="\n") TATCGAGGTTTGTAGCTTCCTTGCA TGCAAGGAAGCTACAAACCTCGATA # Reverse one negative strand (poormans version) seqs[2] <- paste(rev(strsplit(seqs[2], split="")[[1]]), collapse="") > cat(seqs, sep="\n") TATCGAGGTTTGTAGCTTCCTTGCA ATAGCTCCAAACATCGAAGGAACGT So as you see, they are perfect complimentary matches, meaning the probe from one of the unit match one strand and the one from the other unit the same piece of DNA but on the reverse strand. I'll let you take it from there. Hope this helps /Henrik On Fri, Sep 24, 2010 at 12:28 PM, Matt <matt.kowg...@gmail.com> wrote: > Dear Henrik, > > In analyzing some copy number data from chromosome 14, I noticed that > there are 57,103 loci for the Affy 6.0 array, however, only 56,980 of > these loci have unique positions. Here is the proof from R > >> cdf <- AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags="Full") >> gi <- getGenomeInformation(cdf) >> units <- getUnitsOnChromosome(gi, chromosome=14) >> pos <- getPositions(gi, units=units) >> length(pos) > [1] 57103 >> length(unique(pos)) > [1] 56980 >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] aroma.affymetrix_1.7.0 aroma.apd_0.1.7 > affxparser_1.20.0 > [4] R.huge_0.2.0 aroma.core_1.7.0 > aroma.light_1.16.1 > [7] matrixStats_0.2.1 R.rsp_0.3.6 > R.cache_0.3.0 > [10] R.filesets_0.8.3 digest_0.4.2 > R.utils_1.5.0 > [13] R.oo_1.7.3 R.methodsS3_1.2.0 > > > Shouldn't all the loci have different positions in bps? Note, the > duplicate positions seem to correspond to different raw CN > estiamtes.This is a problem for me because my hidden Markov model > doesnt work when I have multiple observations with the same positions. > > Thanks for you assistance. > > Matt > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ > -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/