Hi Jakob.
The error message is due to the fact that your design matrix is not full rank
(i.e. sum of columns 4-7 = sum of columns 2-3). This is of course my fault,
since I suggested it.
That setup (previous post) would've worked for an experiment with observations
for every combination of factors, which you do not have (controls are only
measured at T=0, AP2/AP6 are not measured at T=0). Non-full-rank design
matrices can be difficult to interpret, so an alternative option is this:
S <- factor(rep( c("Acon","AP6","AP2"),
c(6,5+6+6+5,5+6+5+4)))
T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
c(6,5,6,6,5,5,6,5,4)))
f <- factor( paste(S,T,sep=".") )
design <- model.matrix( ~ f )
[...]
As mentioned below, you may need to adjust the reference (here again, control
at T=0) for testing a particular contrast of interest.
Let us know how you go.
Cheers,
Mark
On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:
> Hi Mark,
>
> Thanks.
>
> I have implemented your suggestion for the design matrix when reading the
> data (see below), but encounter the following error when running
> estimateCRDisp:
>
>> d <- estimateCRDisp(d,design)
> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu), :
> the leading minor of order 7 is not positive definite
>
> Applying the simple design:
> design.time <- model.matrix(~ T, data=d$sample)
> does not result in the error....
>
> Appreciate your help!
>
> Best regards,
> Jakob
>
>
>
>
> library(edgeR)
> r <- read.delim(file.path(datadir,
> "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
> dim(r)
> #[1] 78728 205
> rdata <- r[2:205]
> #samp <- substr(colnames(r)[2:205],6,10)
> #usamp <- unique(samp)
> samp <- substr(colnames(r)[2:205],16,28)
> usamp <- unique(samp)
> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
> for (i in 1:length(usamp)){
> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
> }
> colnames(rd) <- usamp
> rownames(rd) <- r[,1]
> # export data
> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
>
> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06"
> "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24"
> "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06"
> "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24"
> "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
> d <- DGEList(counts=rd,group=group)
> d <- calcNormFactors(d)
> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
> design <- model.matrix(~ S + T, data=d$sample)
> # Analysis using Cox-Reid common dispersion
> d <- estimateCRDisp(d,design)
>
>> design
> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
> AP_01_control 1 0 0 0 0 0 0
> AP_02_control 1 0 0 0 0 0 0
> AP_03_control 1 0 0 0 0 0 0
> AP_28_control 1 0 0 0 0 0 0
> AP_29_control 1 0 0 0 0 0 0
> AP_30_control 1 0 0 0 0 0 0
> AP_31_AP6.T06 1 0 1 1 0 0 0
> AP_32_AP6.T06 1 0 1 1 0 0 0
> AP_33_AP6.T06 1 0 1 1 0 0 0
> AP_34_AP6.T06 1 0 1 1 0 0 0
> AP_36_AP6.T06 1 0 1 1 0 0 0
> AP_37_AP6.T12 1 0 1 0 1 0 0
> AP_38_AP6.T12 1 0 1 0 1 0 0
> AP_39_AP6.T12 1 0 1 0 1 0 0
> AP_40_AP6.T12 1 0 1 0 1 0 0
> AP_41_AP6.T12 1 0 1 0 1 0 0
> AP_42_AP6.T12 1 0 1 0 1 0 0
> AP_43_AP6.T24 1 0 1 0 0 1 0
> AP_44_AP6.T24 1 0 1 0 0 1 0
> AP_45_AP6.T24 1 0 1 0 0 1 0
> AP_46_AP6.T24 1 0 1 0 0 1 0
> AP_47_AP6.T24 1 0 1 0 0 1 0
> AP_48_AP6.T24 1 0 1 0 0 1 0
> AP_49_AP6.T48 1 0 1 0 0 0 1
> AP_50_AP6.T48 1 0 1 0 0 0 1
> AP_51_AP6.T48 1 0 1 0 0 0 1
> AP_53_AP6.T48 1 0 1 0 0 0 1
> AP_54_AP6.T48 1 0 1 0 0 0 1
> AP_55_AP2.T06 1 1 0 1 0 0 0
> AP_56_AP2.T06 1 1 0 1 0 0 0
> AP_57_AP2.T06 1 1 0 1 0 0 0
> AP_58_AP2.T06 1 1 0 1 0 0 0
> AP_59_AP2.T06 1 1 0 1 0 0 0
> AP_60_AP2.T12 1 1 0 0 1 0 0
> AP_62_AP2.T12 1 1 0 0 1 0 0
> AP_63_AP2.T12 1 1 0 0 1 0 0
> AP_64_AP2.T12 1 1 0 0 1 0 0
> AP_65_AP2.T12 1 1 0 0 1 0 0
> AP_66_AP2.T12 1 1 0 0 1 0 0
> AP_67_AP2.T24 1 1 0 0 0 1 0
> AP_69_AP2.T24 1 1 0 0 0 1 0
> AP_70_AP2.T24 1 1 0 0 0 1 0
> AP_71_AP2.T24 1 1 0 0 0 1 0
> AP_72_AP2.T24 1 1 0 0 0 1 0
> AP_73_AP2.T48 1 1 0 0 0 0 1
> AP_75_AP2.T48 1 1 0 0 0 0 1
> AP_76_AP2.T48 1 1 0 0 0 0 1
> AP_78_AP2.T48 1 1 0 0 0 0 1
> attr(,"assign")
> [1] 0 1 1 2 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$S
> [1] "contr.treatment"
>
> attr(,"contrasts")$T
> [1] "contr.treatment"
>
>
> -----Oprindelig meddelelse-----
> Fra: Mark Robinson [mailto:[email protected]]
> Sendt: 18. november 2010 05:51
> Til: Jakob Hedegaard
> Cc: [email protected]
> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
>
> Hi Jakob.
>
> Basically, you can analyze your time course data with edgeR through glmFit()
> as you have already done, but by choosing an appropriate design matrix that
> will allow you to test your contrasts of interest.
>
> For example, you might define a couple factors (please double check that this
> actually matches your $samples element of the DGEList object):
>
> S <- factor(rep( c("Acon","AP6","AP2"),
> c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"),
> c(6,5,6,6,5,5,6,5,4)))
>
> ... and then a design matrix:
>
> design <- model.matrix( ~ S + T )
>
> So, this design matrix will look like this:
>
>
>> design <- model.matrix( ~ S + T )
>> design
> (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
> 1 1 0 0 0 0 0 0
> 2 1 0 0 0 0 0 0
> 3 1 0 0 0 0 0 0
> 4 1 0 0 0 0 0 0
> 5 1 0 0 0 0 0 0
> 6 1 0 0 0 0 0 0
> 7 1 0 1 1 0 0 0
> 8 1 0 1 1 0 0 0
> 9 1 0 1 1 0 0 0
> 10 1 0 1 1 0 0 0
> 11 1 0 1 1 0 0 0
> 12 1 0 1 0 1 0 0
> 13 1 0 1 0 1 0 0
> 14 1 0 1 0 1 0 0
> 15 1 0 1 0 1 0 0
> 16 1 0 1 0 1 0 0
> 17 1 0 1 0 1 0 0
> 18 1 0 1 0 0 1 0
> 19 1 0 1 0 0 1 0
> 20 1 0 1 0 0 1 0
> 21 1 0 1 0 0 1 0
> 22 1 0 1 0 0 1 0
> 23 1 0 1 0 0 1 0
> 24 1 0 1 0 0 0 1
> 25 1 0 1 0 0 0 1
> 26 1 0 1 0 0 0 1
> 27 1 0 1 0 0 0 1
> 28 1 0 1 0 0 0 1
> 29 1 1 0 1 0 0 0
> 30 1 1 0 1 0 0 0
> 31 1 1 0 1 0 0 0
> 32 1 1 0 1 0 0 0
> 33 1 1 0 1 0 0 0
> 34 1 1 0 0 1 0 0
> 35 1 1 0 0 1 0 0
> 36 1 1 0 0 1 0 0
> 37 1 1 0 0 1 0 0
> 38 1 1 0 0 1 0 0
> 39 1 1 0 0 1 0 0
> 40 1 1 0 0 0 1 0
> 41 1 1 0 0 0 1 0
> 42 1 1 0 0 0 1 0
> 43 1 1 0 0 0 1 0
> 44 1 1 0 0 0 1 0
> 45 1 1 0 0 0 0 1
> 46 1 1 0 0 0 0 1
> 47 1 1 0 0 0 0 1
> 48 1 1 0 0 0 0 1
> attr(,"assign")
> [1] 0 1 1 2 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$S
> [1] "contr.treatment"
>
> attr(,"contrasts")$T
> [1] "contr.treatment"
>
>
> Note that I've intentionally changed "con" to "Acon" in order to make it the
> control group ... so, the intercept term above represents the abundance
> parameter for the control/T00 samples. Then, you can select your comparison
> of interest through the 'coef' argument of the glmLRT() function. Note that
> you can select multiple columns.
>
> By judicious choice of the design matrix, you should be able to fit any
> contrast of interest. Currently you will probably need to redefine the design
> matrix for a different reference sample to obtain different contrasts. We
> will be adding a 'contrasts' argument to glmLRT() shortly, which will make it
> much easier to investigate different contrasts of interest.
>
> Hope that helps.
>
> Cheers,
> Mark
>
>
> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
>
>> Hi,
>>
>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>>
>> The dataset origin from a challenge experiment with 4-6 samples per time
>> point (T00,T06,T12,T24 and T48) from two series of challenge (with one of
>> two different serotypes). In total 48 individual samples and two factors,
>> time and serotype (see the table below for details)
>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>>
>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes
>> significantly affected to each time point relative to time zero (e.g.
>> T12-T00) - thus initially ignoring the potential difference between the
>> serotypes.
>>
>> But how can I obtain the genes significantly affected across the different
>> contrastes?
>>
>>
>> group lib.size norm.factors serotype time
>> AP_01 control 3734226 1.0575860 con T00
>> AP_02 control 4528260 1.0581673 con T00
>> AP_03 control 3648163 1.0594602 con T00
>> AP_28 control 4671178 1.0430147 con T00
>> AP_29 control 3746020 1.0528085 con T00
>> AP_30 control 4471915 1.1277386 con T00
>> AP_31 AP6.T06 7384334 0.9187757 AP6 T06
>> AP_32 AP6.T06 3649621 0.9035999 AP6 T06
>> AP_33 AP6.T06 5644324 0.8929802 AP6 T06
>> AP_34 AP6.T06 3791540 0.9396600 AP6 T06
>> AP_36 AP6.T06 3922243 0.8524249 AP6 T06
>> AP_37 AP6.T12 3113854 1.0491183 AP6 T12
>> AP_38 AP6.T12 2153867 1.0996506 AP6 T12
>> AP_39 AP6.T12 2979503 1.0905274 AP6 T12
>> AP_40 AP6.T12 5375493 1.0420513 AP6 T12
>> AP_41 AP6.T12 3769654 0.9094147 AP6 T12
>> AP_42 AP6.T12 2621303 1.1272673 AP6 T12
>> AP_43 AP6.T24 3537847 1.0037276 AP6 T24
>> AP_44 AP6.T24 3660552 1.0757808 AP6 T24
>> AP_45 AP6.T24 3284038 1.0701603 AP6 T24
>> AP_46 AP6.T24 3250374 1.0230341 AP6 T24
>> AP_47 AP6.T24 7208387 0.9535068 AP6 T24
>> AP_48 AP6.T24 4169075 0.9730747 AP6 T24
>> AP_49 AP6.T48 5989902 0.9825794 AP6 T48
>> AP_50 AP6.T48 3529028 0.9471979 AP6 T48
>> AP_51 AP6.T48 5104071 1.0772029 AP6 T48
>> AP_53 AP6.T48 4823387 0.9710606 AP6 T48
>> AP_54 AP6.T48 3788201 1.0976409 AP6 T48
>> AP_55 AP2.T06 4919578 0.7872065 AP2 T06
>> AP_56 AP2.T06 4580068 0.9533078 AP2 T06
>> AP_57 AP2.T06 3908180 0.9193207 AP2 T06
>> AP_58 AP2.T06 3466801 0.9887874 AP2 T06
>> AP_59 AP2.T06 4267998 0.8769085 AP2 T06
>> AP_60 AP2.T12 4533905 0.9058305 AP2 T12
>> AP_62 AP2.T12 5906089 0.9352388 AP2 T12
>> AP_63 AP2.T12 3676318 1.1260072 AP2 T12
>> AP_64 AP2.T12 2206081 1.0579246 AP2 T12
>> AP_65 AP2.T12 3955338 1.0221930 AP2 T12
>> AP_66 AP2.T12 3775918 0.9300664 AP2 T12
>> AP_67 AP2.T24 3853681 0.9659259 AP2 T24
>> AP_69 AP2.T24 3761829 0.9592286 AP2 T24
>> AP_70 AP2.T24 4263273 1.0742397 AP2 T24
>> AP_71 AP2.T24 4736798 0.9864702 AP2 T24
>> AP_72 AP2.T24 6387114 1.0462401 AP2 T24
>> AP_73 AP2.T48 3389351 1.0303610 AP2 T48
>> AP_75 AP2.T48 1489023 1.1863821 AP2 T48
>> AP_76 AP2.T48 3588175 1.0250639 AP2 T48
>> AP_78 AP2.T48 3848562 0.9855582 AP2 T48
>>
>>
>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252
>> LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
>> LC_TIME=Danish_Denmark.1252
>>
>> attached base packages:
>> [1] grDevices datasets splines graphics stats tcltk utils
>> methods base
>>
>> other attached packages:
>> [1] edgeR_1.8.2 svSocket_0.9-50 TinnR_1.0.3 R2HTML_2.2
>> Hmisc_3.8-3 survival_2.35-8
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.13.1 grid_2.12.0 lattice_0.19-13 limma_3.6.6
>> svMisc_0.9-60 tools_2.12.0
>>
>>
>> Best regards
>>
>> Jakob Hedegaard
>> Project scientist
>>
>> AARHUS UNIVERSITY
>> Faculty of Agricultural Sciences
>> Dept. of Genetics and Biotechnology
>> Blichers Allé 20, P.O. BOX 50
>> DK-8830 Tjele
>> Denmark
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: [email protected]
> e: [email protected]
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:24}}
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing