Re: [aroma.affymetrix] uncomplete extractDataFrame()

Pierre Neuvial Wed, 07 Jul 2010 23:21:44 -0700

Hi,

One comment below.


On Wed, Jul 7, 2010 at 11:15 PM, Pierre Neuvial
<pie...@stat.berkeley.edu> wrote:
> Hi Emilie,
>
> Sorry for taking such a long time to reply.
>
> You are right: there was a bug in
> AromaUnitSignalBinarySet.writeDataFrame causing the same data chunk to
> be written several times.  This bug will be fixed in the next realease
> of aroma.core.  In the meantime, Henrik has provided a patch: to
> download and install it, just do
>
> library("aroma.affymetrix");
> downloadPackagePatch("aroma.core");
>
> as (now) explained in http://aroma-project.org/howtos/updateOrPatch.
>
> Then
>
> dfTxt <- writeDataFrame(ds$fracB, columns=c("unitName", "chromosome",
> "position", "*"))

I just wanted to add that in general it's better to use the verbose
output, as in:

log <- Arguments$getVerbose(-8, timestamp=TRUE)
dfTxt <- writeDataFrame(ds$fracB, columns=c("unitName", "chromosome",
"position", "*"), verbose=log)

(that's how I located the bug)

Pierre

>
> should give you the txt file you expect.  Let us know if it works for you.
>
> Ouf !
>
> Pierre
>
> On Fri, Jul 2, 2010 at 6:35 AM, Emilie T <temilie...@gmail.com> wrote:
>>
>> Bonjour Pierre,
>> Thank you for your response and and sorry for the mistake in the title of 
>> the subject.
>> Of course my question is about the writeDataFrame() function.
>> Thank you for the help on "ds" object creation. This point seems to be ok 
>> now but I have re-deleted the txt objects and re-try to build the txt object 
>> and I still have the same problem. This is very strange.
>> I can see in your exemple that you also obtain duplicated rows in your 
>> matrix. your "d" object contain 2000000 rows and your "unique(d)" 
>> only 500000. Your matrix is duplicated 4 times in your case.
>> Why this row duplication ?
>> I see that you alo seems to use the Affy SNP 6 chip in your example. I 
>> suppose that we don't have the same number of raws because you certainly 
>> don't use the same annotation file.
>> The Affy 6 SNP have about 2000000 unique unitNames (about 1M SNP + 1M CNV). 
>> So in your case as in mine I expect to have as unique(d) object a matrix 
>> with about 2M unique unitNames (or 1M if you consider that CNV units are non 
>> relevant for fracB calculation).
>> When I compare the 'd' object unitNames to the Affy SNP 6 annotation matrix 
>> (see my code bellow), I see that there is several missing unitNames. In 
>> fact, only the first ones are presents in my "d" object.
>> Do you know how to obtain the fracB measurement for this missing unitNames?
>> Thank you very much for your help.
>> Here is the complete code :
>> > str(ds)
>> List of 2
>>  $ total:Classes 'AromaUnitTotalCnBinarySet', 'CopyNumberDataSet', 
>> 'AromaUnitSignalBinarySet', 'AromaTabularBinarySet', 
>> 'GenericTabularFileSet', 'GenericDataFileSet', 'FullNameInterface', 'Object' 
>>  atomic [1:1] NA
>>   .. ..- attr(*, ".env")=<environment: 0x1863b1f50>
>>   .. ..- attr(*, "...instantiationTime")= POSIXct[1:1], format: "2010-07-02 
>> 10:46:48"
>>  $ fracB:Classes 'AromaUnitFracBCnBinarySet', 'AromaUnitSignalBinarySet', 
>> 'AromaTabularBinarySet', 'GenericTabularFileSet', 'GenericDataFileSet', 
>> 'FullNameInterface', 'Object'  atomic [1:1] NA
>>   .. ..- attr(*, ".env")=<environment: 0x16c5a8950>
>>   .. ..- attr(*, "...instantiationTime")= POSIXct[1:1], format: "2010-07-02 
>> 10:46:49"
>> > dfTxt <- writeDataFrame(ds$fracB, columns=c("unitName", "chromosome", 
>> > "position", "*"))
>> > d <- readDataFrame(dfTxt)
>> > str(d)
>> 'data.frame': 1857154 obs. of  17 variables:
>>  $ unitName     : Factor w/ 71429 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 
>> 493 496 499 502 ...
>>  $ chromosome   : int  NA NA NA NA NA NA NA NA NA NA ...
>>  $ position     : int  NA NA NA NA NA NA NA NA NA NA ...
>>  $ A,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ B,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ C,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ D,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ E,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ F,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ G,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ H,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ I,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ J,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ K,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ L,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ M,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ N,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  - attr(*, "fileHeader")=List of 6
>>   ..$ comments: chr  "# name: data" "# tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" 
>> "# fullName: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# nbrOfFiles: 14" ...
>>   ..$ sep     : chr "\t"
>>   ..$ quote   : chr "\""
>>   ..$ skip    : num 0
>>   ..$ topRows :List of 10
>>   .. ..$ : chr  "unitName" "chromosome" "position" "A,fracB" ...
>>   .. ..$ : chr  "AFFX-5Q-123" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-456" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-789" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-ABC" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A02_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A04_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A06_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A08_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A10_SB" "NA" "NA" "NA" ...
>>   ..$ columns : chr  "unitName" "chromosome" "position" "A,fracB" ...
>> > str(unique(d))
>> 'data.frame': 71429 obs. of  17 variables:
>>  $ unitName     : Factor w/ 71429 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 
>> 493 496 499 502 ...
>>  $ chromosome   : int  NA NA NA NA NA NA NA NA NA NA ...
>>  $ position     : int  NA NA NA NA NA NA NA NA NA NA ...
>>  $ A,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ B,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ C,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ D,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ E,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ F,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ G,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ H,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ I,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ J,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ K,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ L,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ M,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  $ N,fracB : num  NA NA NA NA NA NA NA NA NA NA ...
>>  - attr(*, "fileHeader")=List of 6
>>   ..$ comments: chr  "# name: data" "# tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" 
>> "# fullName: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# nbrOfFiles: 14" ...
>>   ..$ sep     : chr "\t"
>>   ..$ quote   : chr "\""
>>   ..$ skip    : num 0
>>   ..$ topRows :List of 10
>>   .. ..$ : chr  "unitName" "chromosome" "position" "A,fracB" ...
>>   .. ..$ : chr  "AFFX-5Q-123" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-456" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-789" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFFX-5Q-ABC" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A02_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A04_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A06_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A08_SB" "NA" "NA" "NA" ...
>>   .. ..$ : chr  "AFR_A10_SB" "NA" "NA" "NA" ...
>>   ..$ columns : chr  "unitName" "chromosome" "position" "A,fracB" ...
>>
>> > unique(table(d$unitName))
>> [1] 26
>> As I say in my first message, I have controled wich units are missing with 
>> the annotation matrix :
>> > ugp <- AromaUgpFile$byChipType("GenomeWideSNP_6");
>> > ugp
>> AromaUgpFile:
>> Name: GenomeWideSNP_6
>> Tags: na30,hg18,HB20100215
>> Full name: GenomeWideSNP_6,na30,hg18,HB20100215
>> Pathname: 
>> annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6,na30,hg18,HB20100215.ugp
>> File size: 8.85 MB (9281130 bytes)
>> RAM: 0.00 MB
>> Number of data rows: 1856069
>> File format: v1
>> Dimensions: 1856069x2
>> Column classes: integer, integer
>> Number of bytes per column: 1, 4
>> Footer: <createdOn>20100215 21:16:37 
>> CET</createdOn><platform>Affymetrix</platform><chipType>GenomeWideSNP_6</chipType><createdBy><fullname>Henrik
>>  
>> Bengtsson</fullname><email>h...@aroma-project.org</email></createdBy><srcFiles><srcFile1><filename>GenomeWideSNP_6.cdf</filename><filesize>484489553</filesize><checksum>223f3cd9141404b2a926a40cf47d6f1a</checksum></srcFile1><srcFile2><filename>GenomeWideSNP_6,Full.cdf</filename><filesize>493291745</filesize><checksum>3fbe0f6e7c8a346105238a3f3d10d4ec</checksum></srcFile2><srcFile3><filename>GenomeWideSNP_6,Full,na30,hg18,HB20100215.ugp</filename><filesize>9407867</filesize><checksum>446e0ff43fbe9650ab48aa41ecee6bec</checksum></srcFile3></srcFiles>
>> Chip type: GenomeWideSNP_6
>> Platform: Affymetrix
>> > ugpTxt <- writeDataFrame(ugp, columnNamesPrefix="none");
>> > ugpTxt
>> TabularTextFile:
>> Name: GenomeWideSNP_6
>> Tags: na30,hg18,HB20100215.ugp
>> Full name: GenomeWideSNP_6,na30,hg18,HB20100215.ugp
>> Pathname: 
>> annotationData,txt/GenomeWideSNP_6,na30,hg18,HB20100215/GenomeWideSNP_6/GenomeWideSNP_6,na30,hg18,HB20100215.ugp.txt
>> File size: 42.17 MB (44223022 bytes)
>> RAM: 0.01 MB
>> Number of data rows: NA
>> Columns [3]: 'unitName', 'chromosome', 'position'
>> Number of text lines: NA
>> > ugpdata <- readDataFrame(ugpTxt)
>> > str(ugpdata)
>> 'data.frame': 1856069 obs. of  3 variables:
>>  $ unitName  : Factor w/ 1856069 levels "AFFX-5Q-123",..: 1 2 3 4 3509 3512 
>> 3515 3518 3521 3524 ...
>>  $ chromosome: int  NA NA NA NA NA NA NA NA NA NA ...
>>  $ position  : int  NA NA NA NA NA NA NA NA NA NA ...
>>  - attr(*, "fileHeader")=List of 6
>>   ..$ comments: chr  "# name: GenomeWideSNP_6" "# tags: 
>> na30,hg18,HB20100215" "# fullName: GenomeWideSNP_6,na30,hg18,HB20100215" "# 
>> sourceFile: GenomeWideSNP_6,na30,hg18,HB20100215.ugp" ...
>>   ..$ sep     : chr "\t"
>>   ..$ quote   : chr "\""
>>   ..$ skip    : num 0
>>   ..$ topRows :List of 10
>>   .. ..$ : chr  "unitName" "chromosome" "position"
>>   .. ..$ : chr  "AFFX-5Q-123" "NA" "NA"
>>   .. ..$ : chr  "AFFX-5Q-456" "NA" "NA"
>>   .. ..$ : chr  "AFFX-5Q-789" "NA" "NA"
>>   .. ..$ : chr  "AFFX-5Q-ABC" "NA" "NA"
>>   .. ..$ : chr  "AFR_A02_SB" "NA" "NA"
>>   .. ..$ : chr  "AFR_A04_SB" "NA" "NA"
>>   .. ..$ : chr  "AFR_A06_SB" "NA" "NA"
>>   .. ..$ : chr  "AFR_A08_SB" "NA" "NA"
>>   .. ..$ : chr  "AFR_A10_SB" "NA" "NA"
>>   ..$ columns : chr  "unitName" "chromosome" "position"
>> > common <- ugpdata$unitName %in% d$unitName
>> > plot(as.numeric(common),xlim=c(1,length(common)),ylim=c(0,1))
>>
>>
>> 2010/7/2 Pierre Neuvial <pie...@stat.berkeley.edu>
>>>
>>> Salut Emilie,
>>>
>>> On Thu, Jul 1, 2010 at 10:13 AM, EmilieT <temilie...@gmail.com> wrote:
>>> > Hello,
>>> >
>>> > I am using your R framework with a set of Affymetrix SNP 6 data and I
>>> > have a problem with the extractDataFrame function.
>>> > The result is an incomplete matrix with row duplication.
>>> >
>>> >> sessionInfo()
>>> > R version 2.11.1 (2010-05-31)
>>> > x86_64-apple-darwin9.8.0
>>> >
>>> > locale:
>>> > [1] fr_FR.UTF-8/fr_FR.UTF-8/C/C/fr_FR.UTF-8/fr_FR.UTF-8
>>> >
>>> > attached base packages:
>>> > [1] stats     graphics  grDevices utils     datasets  methods
>>> > base
>>> >
>>> > other attached packages:
>>> >  [1] aroma.cn_0.5.0         aroma.affymetrix_1.6.0
>>> > aroma.apd_0.1.7        affxparser_1.20.0      R.huge_0.2.0
>>> >  [6] aroma.core_1.6.0       matrixStats_0.2.1
>>> > R.rsp_0.3.6            R.cache_0.3.0          R.filesets_0.8.2
>>> > [11] digest_0.4.2           R.utils_1.4.0
>>> > R.oo_1.7.2             aroma.light_1.16.0     R.methodsS3_1.2.0
>>> >
>>> > I use the standard doCRMAv2 function :
>>> >  > ds <- doCRMAv2("data",
>>> > chipType="GenomeWideSNP_6",combineAlleles=FALSE);
>>> >
>>> >> ds
>>> > $total
>>> > AromaUnitTotalCnBinarySet:
>>> > Name: data
>>> > Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Number of files: 14
>>> > Names: A,B, ..., C [14]
>>> > Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
>>> > XY,AVG,FLN,-XY/GenomeWideSNP_6
>>> > Total file size: 99.13 MB
>>> > RAM: 0.02MB
>>> >
>>> > $fracB
>>> > AromaUnitFracBCnBinarySet:
>>> > Name: data
>>> > Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Number of files: 14
>>> > Names: A,B, ..., C [14]
>>> > Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
>>> > XY,AVG,FLN,-XY/GenomeWideSNP_6
>>> > Total file size: 99.13 MB
>>> > RAM: 0.02MB
>>> >
>>> > It seems to be impossible to use this 'ds' object (or ds$fracB or ds
>>> > $total) as an entrance for the extractDataFrame() function.
>>>
>>> Yes: this is because extractDataFrame is meant to extract *chip
>>> effects* (http://aroma-project.org/howtos/extractDataFrame) in your
>>> case total and allele-specific *intensities*, and your ds$total and
>>> ds$fracB are already one step further in the analysis: they are
>>> AromaUnit*CnBinaryFile:s.  For these you can use writeDataFrame
>>> (http://aroma-project.org/howtos/writeDataFrame) as you seem to be
>>> doing below.
>>>
>>> > So I must do :
>>> >
>>> >> rootPath <- "totalAndFracBData"
>>> >> dataSet <- "data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY"
>>> >> ds <- AromaUnitFracBCnBinarySet$byName(dataSet, 
>>> >> chipType="GenomeWideSNP_6", paths=rootPath);
>>> >> ds
>>> > AromaUnitFracBCnBinarySet:
>>> > Name: data
>>> > Tags: ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Full name: data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY
>>> > Number of files: 14
>>> > Names: A,B, ..., C [14]
>>> > Path (to the first file): totalAndFracBData/data,ACC,ra,-XY,BPN,-
>>> > XY,AVG,FLN,-XY/GenomeWideSNP_6
>>> > Total file size: 99.13 MB
>>> > RAM: 0.02MB
>>>
>>> You don't really to do this: your new 'ds' is exactly your previous
>>> 'ds$fracB' (more on this below).
>>>
>>> >
>>> > When I use the extractDataFrame function, I obtain the folowing
>>> > object :
>>>
>>> Below you are using writeDataFrame, not extractDataFrame. Right ?
>>>
>>> >
>>> >> dfTxt <- writeDataFrame(ds, columns=c("unitName", "chromosome", 
>>> >> "position", "*"))
>>> >> d <- readDataFrame(dfTxt)
>>> >> str(d)
>>> > 'data.frame':   1857154 obs. of  17 variables:
>>> >  $ unitName                     : Factor w/ 71429 levels
>>> > "AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ...
>>> >  $ chromosome                : int  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ position                        : int  NA NA NA NA NA NA NA NA NA
>>> > NA ...
>>> >  $ A,fracB                        : num  NA NA NA NA NA NA NA NA NA
>>> > NA ...
>>> >  $ B,fracB                        : num  NA NA NA NA NA NA NA NA NA
>>> > NA ...
>>> >  $ C,fracB                       : num  NA NA NA NA NA NA NA NA NA
>>> > NA ...
>>> >  $ ...
>>> >
>>> > First of all, you can see that there is only the fracB columns. The
>>> > first "ds" object had a "total" item, it seems to have been lost. The
>>> > directory
>>> > /totalAndFracBData/data,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY/GenomeWideSNP_6
>>> > also contain the ....,total.asb files. There is maybe a problem with
>>> > my new 'ds' object (which refers to only 14 files).
>>>
>>> Yes, this is expected because your new 'ds' has been created using
>>>
>>> ds <- AromaUnitFracBCnBinarySet$byName(dataSet,
>>> chipType="GenomeWideSNP_6", paths=rootPath);
>>>
>>> As the "FracB" indicates, this 'ds' only contains allele B fractions. You 
>>> can do
>>>
>>> totalDs <- AromaUnitTotalCnBinarySet$byName(dataSet,
>>> chipType="GenomeWideSNP_6", paths=rootPath);
>>>
>>> to get the corresponding total CN data.
>>>
>>> >
>>> > There is also a problem of row duplication : you can see that the
>>> > number of row is the same as Affymetrix SNP 6 number of units (so the
>>> > result seems to be good).
>>>
>>> Well, I've tried to reproduce what you have and I'm getting 2000000 rows:
>>>
>>> > str(d);
>>> 'data.frame':   2000000 obs. of  5 variables:
>>>  $ unitName                                                   : Factor
>>> w/ 500000 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ...
>>>  $ chromosome                                                 : int
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ position                                                   : int
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB: num
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E04_238456,fracB: num
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  - attr(*, "fileHeader")=List of 6
>>>  ..$ comments: chr  "# name: TumorBoostPaper" "# tags:
>>> pairs,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# fullName:
>>> TumorBoostPaper,pairs,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# nbrOfFiles:
>>> 2" ...
>>>  ..$ sep     : chr "\t"
>>>  ..$ quote   : chr "\""
>>>  ..$ skip    : num 0
>>>  ..$ topRows :List of 10
>>>  .. ..$ : chr  "unitName" "chromosome" "position"
>>> "STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB" ...
>>>  .. ..$ : chr  "AFFX-5Q-123" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-456" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-789" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-ABC" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A02_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A04_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A06_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A08_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A10_SB" "NA" "NA" "NA" ...
>>>  ..$ columns : chr  "unitName" "chromosome" "position"
>>> "STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB" ...
>>>
>>> > But there is only 71429 unique unitNames. In
>>> > fact, there is only 71429 unique rows :
>>> >
>>> >> str(unique(d))
>>> > 'data.frame':   71429 obs. of  17 variables:
>>> >  $ unitName               : Factor w/ 71429 levels "AFFX-5Q-123",..: 1
>>> > 2 3 4 487 490 493 496 499 502 ...
>>> >  $ chromosome          : int  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ position                  : int  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ A,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ B,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ C,fracB                  : num  NA NA NA NA NA NA NA NA NA NA ...
>>> >  $ ...
>>> >
>>> > Each row seems to be duplicated 26 times :
>>> >> unique(table(d$unitName))
>>> > [1] 26
>>> >
>>>
>>> I can't reproduce this.  Here is what I get:
>>>
>>> > str(unique(d))
>>> 'data.frame':   500000 obs. of  5 variables:
>>>  $ unitName                                                   : Factor
>>> w/ 500000 levels "AFFX-5Q-123",..: 1 2 3 4 487 490 493 496 499 502 ...
>>>  $ chromosome                                                 : int
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ position                                                   : int
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB: num
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  $ STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E04_238456,fracB: num
>>> NA NA NA NA NA NA NA NA NA NA ...
>>>  - attr(*, "fileHeader")=List of 6
>>>  ..$ comments: chr  "# name: TumorBoostPaper" "# tags:
>>> pairs,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# fullName:
>>> TumorBoostPaper,pairs,ACC,ra,-XY,BPN,-XY,AVG,FLN,-XY" "# nbrOfFiles:
>>> 2" ...
>>>  ..$ sep     : chr "\t"
>>>  ..$ quote   : chr "\""
>>>  ..$ skip    : num 0
>>>  ..$ topRows :List of 10
>>>  .. ..$ : chr  "unitName" "chromosome" "position"
>>> "STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB" ...
>>>  .. ..$ : chr  "AFFX-5Q-123" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-456" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-789" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFFX-5Q-ABC" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A02_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A04_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A06_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A08_SB" "NA" "NA" "NA" ...
>>>  .. ..$ : chr  "AFR_A10_SB" "NA" "NA" "NA" ...
>>>  ..$ columns : chr  "unitName" "chromosome" "position"
>>> "STAIR_p_TCGA_Batch7_Affx_N_GenomeWideSNP_6_E03_238454,fracB" ...
>>>
>>> >
>>> > I use the extractDataFrame function on the ugp object and it seems to
>>> > work so my ugp file is probably correct.
>>>
>>> What have you done exactly here ?
>>>
>>> > I also notice that the 71429 unitNames of the 'd' object are the first
>>> > 71429 lines of my ugp matrix.
>>> >
>>>
>>> Can you delete the txt file and (re)do
>>>
>>> ds <- doCRMAv2("data", chipType="GenomeWideSNP_6",combineAlleles=FALSE);
>>> dfTxt <- writeDataFrame(ds$fracB, columns=c("unitName", "chromosome",
>>> "position", "*"))
>>> d <- readDataFrame(dfTxt)
>>>
>>> ?  Do you stil have the same problem ?
>>>
>>> Pierre
>>>
>>> > I hope you can help me out. Thank you
>>> >
>>> > --
>>> > When reporting problems on aroma.affymetrix, make sure 1) to run the 
>>> > latest version of the package, 2) to report the output of sessionInfo() 
>>> > and traceback(), and 3) to post a complete code example.
>>> >
>>> >
>>> > You received this message because you are subscribed to the Google Groups 
>>> > "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> > To post to this group, send email to aroma-affymetrix@googlegroups.com
>>> > To unsubscribe and other options, go to 
>>> > http://www.aroma-project.org/forum/
>>> >
>>>
>>> --
>>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
>>> version of the package, 2) to report the output of sessionInfo() and 
>>> traceback(), and 3) to post a complete code example.
>>>
>>>
>>> You received this message because you are subscribed to the Google Groups 
>>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>>
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
>> version of the package, 2) to report the output of sessionInfo() and 
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups 
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to aroma-affymetrix@googlegroups.com
>> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

-- 
When reporting problems on aroma.affymetrix, make sure 1) to run the latest 
version of the package, 2) to report the output of sessionInfo() and 
traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups 
"aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to aroma-affymetrix@googlegroups.com
To unsubscribe and other options, go to http://www.aroma-project.org/forum/

Re: [aroma.affymetrix] uncomplete extractDataFrame()

Reply via email to