Hi. On Fri, Dec 7, 2012 at 7:04 PM, zhouzaiwei <zhouzai...@163.com> wrote: > Hi, everyone, I want to use FIRMAGene to analysis differencial splicing of > hugene-1.0-st array ,there is no overlap between Ensembl identifiers from > getUnitNames() and Affymetrix identifiers in 'hgnetaffx'.follows my code: >>library(aroma.affymetrix) >>library(FIRMAGene) > >>hgnetaffx <- > read.csv("HuGene-1_0-st-v1.na25.hg18.transcript.csv",sep=",",skip=19,header=TRUE,comment.char="",stringsAsFactors=FALSE)
It's easier to access/view the content of that Affymetrix CSV file, if you do: > db <- AffymetrixNetAffxCsvFile("HuGene-1_0-st-v1.na25.hg18.transcript.csv"); > db AffymetrixNetAffxCsvFile: Name: HuGene-1_0-st-v1.na25.hg18.transcript.csv Tags: Full name: HuGene-1_0-st-v1.na25.hg18.transcript.csv Pathname: HuGene-1_0-st-v1.na25.hg18.transcript.csv File size: 88.32 MB (92614329 bytes) RAM: 0.03 MB Number of data rows: NA Columns [18]: 'transcriptClusterId', 'probesetId', 'seqname', 'strand', 'start', 'stop', 'totalProbes', 'geneAssignment', 'mrnaAssignment', 'swissprot', 'unigene', 'goBiologicalProcess', 'goCellularComponent', 'goMolecularFunction', 'pathway', 'proteinDomains', 'crosshybType', 'category' Number of text lines: NA > data <- readDataFrame(db) > str(data[1:2,]) 'data.frame': 2 obs. of 18 variables: $ transcriptClusterId: int 7896738 7896740 $ probesetId : int 7896738 7896740 $ seqname : chr "chr1" "chr1" $ strand : chr "+" "+" $ start : int 52878 58954 $ stop : int 53750 59871 $ totalProbes : int 31 24 $ geneAssignment : chr NA "NM_001005240 // OR4F17 // olfactory receptor, family 4, subfamily F, member 17 // 19p13.3 // 81099 /// NM_001004195 // OR4F4 //"| __truncated__ $ mrnaAssignment : chr "ENST00000328113 // ENSEMBL Transcript // cdna:known chromosome:NCBI36:15:100284711:100285367:-1 gene:ENSG00000183909 // chr1 /"| __truncated__ "NM_001005240 // RefSeq // Homo sapiens olfactory receptor, family 4, subfamily F, member 17 (OR4F17), mRNA. // chr1 // 100 // 1"| __truncated__ $ swissprot : chr NA "AY972817 // Q52R94 /// AY972817 // Q52R93 /// AY972817 // Q52R92" $ unigene : chr NA "NM_001005240 // Hs.572591 // --- /// NM_001004195 // Hs.554420 // --- /// NM_001005484 // Hs.554500 // --- /// AY972817 // Hs.5"| __truncated__ $ goBiologicalProcess: chr NA "NM_001005240 // GO:0007165 // signal transduction // inferred from electronic annotation /// NM_001005240 // GO:0007186 // G-p"| __truncated__ $ goCellularComponent: chr NA "NM_001005240 // GO:0005886 // plasma membrane // inferred from electronic annotation /// NM_001005240 // GO:0016021 // integra"| __truncated__ $ goMolecularFunction: chr NA "NM_001005240 // GO:0004872 // receptor activity // inferred from electronic annotation /// NM_001005240 // GO:0004984 // olfac"| __truncated__ $ pathway : logi NA NA $ proteinDomains : logi NA NA $ crosshybType : int 3 3 $ category : chr "main" "main" > > cdf <- AffymetrixCdfFile$byChipType('HuGene-1_0-st-v1', tags="Ensembl,exon") >> cdf > AffymetrixCdfFile: > Path: annotationData/chipTypes/HuGene-1_0-st-v1 > Filename: HuGene-1_0-st-v1,Ensembl,exon.cdf > File size: 28.51 MB (29891482 bytes) > Chip type: HuGene-1_0-st-v1,Ensembl,exon > RAM: 0.00MB > File format: v4 (binary; XDA) > Dimension: 1050x1050 > Number of cells: 1102500 > Number of units: 27901 > Cells per unit: 39.51 > Number of QC units: 0 > unitNames <- getUnitNames(cdf); > str(unitNames) chr [1:27901] "ENSG00000196735" "ENSG00000179344" ... There you see that the unit names in the CDF seems to match what's embedded in the 'data$mrnaAssignment' column. > >> u <- which(getUnitNames(cdf) %in% hgnetaffx$probeset_id[hgnetaffx$category >> == "main" & hgnetaffx$total_probes > 7 & hgnetaffx$total_probes < 200]) In other words, the 'hgnetaffx$probeset_id' column is not the correct column, because the just contains integers/indices. You may be able to pull out the ENSG* components from 'data' as: > geneNames <- gsub(".*(ENSG[0-9]*).*", "\\1", data$mrnaAssignment); But it does not appear to be the full story, because then you only get 18,715 hits: > summary(unitNames %in% geneNames) Mode FALSE TRUE NA's logical 9186 18715 0 At least that should get you going on how you can map CDF unit names to whatever annotation data you have. Working with raw/original annotation data files is sometimes an "art"; you really need to look into the files, figure out how to parse them etc. /Henrik > >> u[1:200] > [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [42] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [83] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [124] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > [165] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > NA NA NA NA NA NA NA NA NA NA NA NA NA > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Chinese_People's Republic of China.936 > LC_CTYPE=Chinese_People's Republic of China.936 > [3] LC_MONETARY=Chinese_People's Republic of China.936 LC_NUMERIC=C > [5] LC_TIME=Chinese_People's Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] aroma.affymetrix_2.7.0 affxparser_1.28.1 aroma.apd_0.2.3 > R.huge_0.4.1 aroma.light_1.28.0 > [6] aroma.core_2.7.0 matrixStats_0.6.2 R.rsp_0.8.2 > R.devices_2.1.3 R.cache_0.6.5 > [11] R.filesets_1.6.0 digest_0.6.0 R.utils_1.18.0 > R.oo_1.10.2 FIRMAGene_0.9.7 > [16] R.methodsS3_1.4.2 > > loaded via a namespace (and not attached): > [1] PSCBS_0.30.0 > > how can i fix it? did i download the wrong version of annotation?I download > the HuGene-1_0-st-v1.na25.hg18.transcript.csv from www.affymetix.com ;and > HuGene-1_0-st-v1.probe.tab from > http://media.affymetrix.com/analysis/downloads/na23/wtgene/HuGene-1_0-st-v1.probe.tab.zip > i can not find probe.tab of na25. > > > > -- > View this message in context: > http://aroma-affymetrix.967894.n3.nabble.com/error-on-FIRMAGene-tp4024980.html > Sent from the aroma.affymetrix mailing list archive at Nabble.com. > > -- > When reporting problems on aroma.affymetrix, make sure 1) to run the latest > version of the package, 2) to report the output of sessionInfo() and > traceback(), and 3) to post a complete code example. > > > You received this message because you are subscribed to the Google Groups > "aroma.affymetrix" group with website http://www.aroma-project.org/. > To post to this group, send email to aroma-affymetrix@googlegroups.com > To unsubscribe and other options, go to http://www.aroma-project.org/forum/ -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/