Hello, I am using your R framework with a set of Affymetrix SNP 6 data and I have a problem with the extractDataFrame function. The result is an incomplete matrix with row duplication.
> sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] aroma.cn_0.5.0 aroma.affymetrix_1.6.0 aroma.apd_0.1.7 affxparser_1.20.0 R.huge_0.2.0 [6] aroma.core_1.6.0 matrixStats_0.2.1 R.rsp_0.3.6 R.cache_0.3.0 R.filesets_0.8.2 [11] digest_0.4.2 R.utils_1.4.0 R.oo_1.7.2 aroma.light_1.16.0 R.methodsS3_1.2.0 I use the standard doCRMAv2 function : > ds <- doCRMAv2("data", chipType="GenomeWideSNP_6",combineAlleles=FALSE); > ds $total AromaUnitTotalCnBinarySet: Name: data Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Number of files: 14 Names: A,B, ..., C [14] Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,- XY,AVG,FLN,-XY/GenomeWideSNP_6 Total file size: 99.13 MB RAM: 0.02MB $fracB AromaUnitFracBCnBinarySet: Name: data Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Number of files: 14 Names: A,B, ..., C [14] Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,- XY,AVG,FLN,-XY/GenomeWideSNP_6 Total file size: 99.13 MB RAM: 0.02MB It seems to be impossible to use this 'ds' object (or ds$fracB or ds $total) as an entrance for the extractDataFrame() function. So I must do : > rootPath <- "totalAndFracBData" > dataSet <- "data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" > ds <- AromaUnitFracBCnBinarySet$byName(dataSet, chipType="GenomeWideSNP_6", > paths=rootPath); > ds AromaUnitFracBCnBinarySet: Name: data Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY Number of files: 14 Names: A,B, ..., C [14] Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,- XY,AVG,FLN,-XY/GenomeWideSNP_6 Total file size: 99.13 MB RAM: 0.02MB When I use the extractDataFrame function, I obtain the folowing object : > dfTxt <- writeDataFrame(ds, columns=c("unitName", "chromosome", "position", > "*")) > d <- readDataFrame(dfTxt) > str(d) 'data.frame': 1857154 obs. of 17 variables: $ unitName : Factor w/ 71429 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ... $ chromosome : int NA NA NA NA NA NA NA NA NA NA ... $ position : int NA NA NA NA NA NA NA NA NA NA ... $ A,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ B,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ C,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ ... First of all, you can see that there is only the fracB columns. The first "ds" object had a "total" item, it seems to have been lost. The directory /totalAndFracBData/data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6 also contain the ....,total.asb files. There is maybe a problem with my new 'ds' object (which refers to only 14 files). There is also a problem of row duplication : you can see that the number of row is the same as Affymetrix SNP 6 number of units (so the result seems to be good). But there is only 71429 unique unitNames. In fact, there is only 71429 unique rows : > str(unique(d)) 'data.frame': 71429 obs. of 17 variables: $ unitName : Factor w/ 71429 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ... $ chromosome : int NA NA NA NA NA NA NA NA NA NA ... $ position : int NA NA NA NA NA NA NA NA NA NA ... $ A,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ B,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ C,fracB : num NA NA NA NA NA NA NA NA NA NA ... $ ... Each row seems to be duplicated 26 times : > unique(table(d$unitName)) [1] 26 I use the extractDataFrame function on the ugp object and it seems to work so my ugp file is probably correct. I also notice that the 71429 unitNames of the 'd' object are the first 71429 lines of my ugp matrix. I hope you can help me out. Thank you -- When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example. You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/. To post to this group, send email to aroma-affymetrix@googlegroups.com To unsubscribe and other options, go to http://www.aroma-project.org/forum/