Re: [aroma.affymetrix] error on FIRMAGene

Henrik Bengtsson Fri, 07 Dec 2012 20:20:46 -0800

Hi.

On Fri, Dec 7, 2012 at 7:04 PM, zhouzaiwei <zhouzai...@163.com> wrote:
> Hi, everyone, I want to use FIRMAGene to analysis differencial splicing of
> hugene-1.0-st array ,there is no overlap between Ensembl identifiers from
> getUnitNames() and Affymetrix identifiers in 'hgnetaffx'.follows my code:
>>library(aroma.affymetrix)
>>library(FIRMAGene)
>
>>hgnetaffx <-
> read.csv("HuGene-1_0-st-v1.na25.hg18.transcript.csv",sep=",",skip=19,header=TRUE,comment.char="",stringsAsFactors=FALSE)


It's easier to access/view the content of that Affymetrix CSV file, if you do:

> db <- AffymetrixNetAffxCsvFile("HuGene-1_0-st-v1.na25.hg18.transcript.csv");
> db
AffymetrixNetAffxCsvFile:
Name: HuGene-1_0-st-v1.na25.hg18.transcript.csv
Tags:
Full name: HuGene-1_0-st-v1.na25.hg18.transcript.csv
Pathname: HuGene-1_0-st-v1.na25.hg18.transcript.csv
File size: 88.32 MB (92614329 bytes)
RAM: 0.03 MB
Number of data rows: NA
Columns [18]: 'transcriptClusterId', 'probesetId', 'seqname',
'strand', 'start', 'stop', 'totalProbes', 'geneAssignment',
'mrnaAssignment', 'swissprot', 'unigene', 'goBiologicalProcess',
'goCellularComponent', 'goMolecularFunction', 'pathway',
'proteinDomains', 'crosshybType', 'category'
Number of text lines: NA

> data <- readDataFrame(db)

> str(data[1:2,])
'data.frame':   2 obs. of  18 variables:
 $ transcriptClusterId: int  7896738 7896740
 $ probesetId         : int  7896738 7896740
 $ seqname            : chr  "chr1" "chr1"
 $ strand             : chr  "+" "+"
 $ start              : int  52878 58954
 $ stop               : int  53750 59871
 $ totalProbes        : int  31 24
 $ geneAssignment     : chr  NA "NM_001005240 // OR4F17 // olfactory
receptor, family 4, subfamily F, member 17 // 19p13.3 // 81099 ///
NM_001004195 // OR4F4 //"| __truncated__
 $ mrnaAssignment     : chr  "ENST00000328113 // ENSEMBL Transcript //
 cdna:known chromosome:NCBI36:15:100284711:100285367:-1
gene:ENSG00000183909 // chr1 /"| __truncated__ "NM_001005240 // RefSeq
// Homo sapiens olfactory receptor, family 4, subfamily F, member 17
(OR4F17), mRNA. // chr1 // 100 // 1"| __truncated__
 $ swissprot          : chr  NA "AY972817 // Q52R94 /// AY972817 //
Q52R93 /// AY972817 // Q52R92"
 $ unigene            : chr  NA "NM_001005240 // Hs.572591 // --- ///
NM_001004195 // Hs.554420 // --- /// NM_001005484 // Hs.554500 // ---
/// AY972817 // Hs.5"| __truncated__
 $ goBiologicalProcess: chr  NA "NM_001005240 // GO:0007165 // signal
transduction // inferred from electronic annotation  /// NM_001005240
// GO:0007186 // G-p"| __truncated__
 $ goCellularComponent: chr  NA "NM_001005240 // GO:0005886 // plasma
membrane // inferred from electronic annotation  /// NM_001005240 //
GO:0016021 // integra"| __truncated__
 $ goMolecularFunction: chr  NA "NM_001005240 // GO:0004872 //
receptor activity // inferred from electronic annotation  ///
NM_001005240 // GO:0004984 // olfac"| __truncated__
 $ pathway            : logi  NA NA
 $ proteinDomains     : logi  NA NA
 $ crosshybType       : int  3 3
 $ category           : chr  "main" "main"

>
> cdf <- AffymetrixCdfFile$byChipType('HuGene-1_0-st-v1', tags="Ensembl,exon")
>> cdf
> AffymetrixCdfFile:
> Path: annotationData/chipTypes/HuGene-1_0-st-v1
> Filename: HuGene-1_0-st-v1,Ensembl,exon.cdf
> File size: 28.51 MB (29891482 bytes)
> Chip type: HuGene-1_0-st-v1,Ensembl,exon
> RAM: 0.00MB
> File format: v4 (binary; XDA)
> Dimension: 1050x1050
> Number of cells: 1102500
> Number of units: 27901
> Cells per unit: 39.51
> Number of QC units: 0

> unitNames <- getUnitNames(cdf);
> str(unitNames)
 chr [1:27901] "ENSG00000196735" "ENSG00000179344" ...

There you see that the unit names in the CDF seems to match what's
embedded in the 'data$mrnaAssignment' column.

>
>> u <- which(getUnitNames(cdf) %in% hgnetaffx$probeset_id[hgnetaffx$category
>> == "main" & hgnetaffx$total_probes > 7 & hgnetaffx$total_probes < 200])

In other words, the 'hgnetaffx$probeset_id' column is not the correct
column, because the just contains integers/indices.

You may be able to pull out the ENSG* components from 'data' as:

> geneNames <- gsub(".*(ENSG[0-9]*).*", "\\1", data$mrnaAssignment);

But it does not appear to be the full story, because then you only get
18,715 hits:

> summary(unitNames %in% geneNames)
   Mode   FALSE    TRUE    NA's
logical    9186   18715       0

At least that should get you going on how you can map CDF unit names
to whatever annotation data you have.  Working with raw/original
annotation data files is sometimes an "art"; you really need to look
into the files, figure out how to parse them etc.

/Henrik

>
>> u[1:200]
>   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>  [42] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>  [83] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> [124] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> [165] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA NA NA NA NA NA NA NA NA NA NA NA
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=Chinese_People's Republic of China.936
> LC_CTYPE=Chinese_People's Republic of China.936
> [3] LC_MONETARY=Chinese_People's Republic of China.936 LC_NUMERIC=C
> [5] LC_TIME=Chinese_People's Republic of China.936
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] aroma.affymetrix_2.7.0 affxparser_1.28.1      aroma.apd_0.2.3
> R.huge_0.4.1           aroma.light_1.28.0
>  [6] aroma.core_2.7.0       matrixStats_0.6.2      R.rsp_0.8.2
> R.devices_2.1.3        R.cache_0.6.5
> [11] R.filesets_1.6.0       digest_0.6.0           R.utils_1.18.0
> R.oo_1.10.2            FIRMAGene_0.9.7
> [16] R.methodsS3_1.4.2
>
> loaded via a namespace (and not attached):
> [1] PSCBS_0.30.0
>
> how can i fix it? did i download the wrong version of annotation?I download
> the HuGene-1_0-st-v1.na25.hg18.transcript.csv from www.affymetix.com ;and
> HuGene-1_0-st-v1.probe.tab from
> http://media.affymetrix.com/analysis/downloads/na23/wtgene/HuGene-1_0-st-v1.probe.tab.zip
> i can not find probe.tab of na25.
>
>
>
> --
> View this message in context: 
> http://aroma-affymetrix.967894.n3.nabble.com/error-on-FIRMAGene-tp4024980.html
> Sent from the aroma.affymetrix mailing list archive at Nabble.com.
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
> version of the package, 2) to report the output of sessionInfo() and 
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups 
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to aroma-affymetrix@googlegroups.com
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] error on FIRMAGene

Reply via email to