[aroma.affymetrix] uncomplete extractDataFrame()

EmilieT Thu, 01 Jul 2010 10:13:56 -0700

Hello,

I am using your R framework with a set of Affymetrix SNP 6 data and I
have a problem with the extractDataFrame function.
The result is an incomplete matrix with row duplication.


> sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
 [1] aroma.cn_0.5.0         aroma.affymetrix_1.6.0
aroma.apd_0.1.7        affxparser_1.20.0      R.huge_0.2.0
 [6] aroma.core_1.6.0       matrixStats_0.2.1
R.rsp_0.3.6            R.cache_0.3.0          R.filesets_0.8.2
[11] digest_0.4.2           R.utils_1.4.0
R.oo_1.7.2             aroma.light_1.16.0     R.methodsS3_1.2.0

I use the standard doCRMAv2 function :
 > ds <- doCRMAv2("data",
chipType="GenomeWideSNP_6",combineAlleles=FALSE);

> ds
$total
AromaUnitTotalCnBinarySet:
Name: data
Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Number of files: 14
Names: A,B, ..., C [14]
Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
XY,AVG,FLN,-XY/GenomeWideSNP_6
Total file size: 99.13 MB
RAM: 0.02MB

$fracB
AromaUnitFracBCnBinarySet:
Name: data
Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Number of files: 14
Names: A,B, ..., C [14]
Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
XY,AVG,FLN,-XY/GenomeWideSNP_6
Total file size: 99.13 MB
RAM: 0.02MB

It seems to be impossible to use this 'ds' object (or ds$fracB or ds
$total) as an entrance for the extractDataFrame() function.
So I must do :

> rootPath <- "totalAndFracBData"
> dataSet <- "data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY"
> ds <- AromaUnitFracBCnBinarySet$byName(dataSet, chipType="GenomeWideSNP_6", 
> paths=rootPath);
> ds
AromaUnitFracBCnBinarySet:
Name: data
Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
Number of files: 14
Names: A,B, ..., C [14]
Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
XY,AVG,FLN,-XY/GenomeWideSNP_6
Total file size: 99.13 MB
RAM: 0.02MB

When I use the extractDataFrame function, I obtain the folowing
object :

> dfTxt <- writeDataFrame(ds, columns=c("unitName", "chromosome", "position", 
> "*"))
> d <- readDataFrame(dfTxt)
> str(d)
'data.frame':   1857154 obs. of  17 variables:
 $ unitName                     : Factor w/ 71429 levels
"AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ...
 $ chromosome                : int  NA NA NA NA NA NA NA NA NA NA ...
 $ position                        : int  NA NA NA NA NA NA NA NA NA
NA ...
 $ A,fracB                        : num  NA NA NA NA NA NA NA NA NA
NA ...
 $ B,fracB                        : num  NA NA NA NA NA NA NA NA NA
NA ...
 $ C,fracB                       : num  NA NA NA NA NA NA NA NA NA
NA ...
 $ ...

First of all, you can see that there is only the fracB columns. The
first "ds" object had a "total" item, it seems to have been lost. The
directory
/totalAndFracBData/data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6
also contain the ....,total.asb files. There is maybe a problem with
my new 'ds' object (which refers to only 14 files).

There is also a problem of row duplication : you can see that the
number of row is the same as Affymetrix SNP 6 number of units (so the
result seems to be good). But there is only 71429 unique unitNames. In
fact, there is only 71429 unique rows :

> str(unique(d))
'data.frame':   71429 obs. of  17 variables:
 $ unitName               : Factor w/ 71429 levels "AFFX-5Q-123",..: 1
2 3 4 487 490 493 496 499 502 ...
 $ chromosome          : int  NA NA NA NA NA NA NA NA NA NA ...
 $ position                  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ A,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ B,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ C,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ ...

Each row seems to be duplicated 26 times :
> unique(table(d$unitName))
[1] 26

I use the extractDataFrame function on the ugp object and it seems to
work so my ugp file is probably correct.
I also notice that the 71429 unitNames of the 'd' object are the first
71429 lines of my ugp matrix.

I hope you can help me out. Thank you

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

[aroma.affymetrix] uncomplete extractDataFrame()

Reply via email to