Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?

Mark Robinson Tue, 23 Nov 2010 02:44:52 -0800

Hi Jakob.

The error message is due to the fact that your design matrix is not full rank 
(i.e. sum of columns 4-7 = sum of columns 2-3).  This is of course my fault, 
since I suggested it.


That setup (previous post) would've worked for an experiment with observations 
for every combination of factors, which you do not have (controls are only 
measured at T=0, AP2/AP6 are not measured at T=0).  Non-full-rank design 
matrices can be difficult to interpret, so an alternative option is this:

S <- factor(rep( c("Acon","AP6","AP2"), 
               c(6,5+6+6+5,5+6+5+4)))
T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), 
               c(6,5,6,6,5,5,6,5,4)))

f <- factor( paste(S,T,sep=".") )
design <- model.matrix( ~ f )
[...]

As mentioned below, you may need to adjust the reference (here again, control 
at T=0) for testing a particular contrast of interest.

Let us know how you go.

Cheers,
Mark

On 2010-11-23, at 12:25 AM, Jakob Hedegaard wrote:

> Hi Mark,
> 
> Thanks.
> 
> I have implemented your suggestion for the design matrix when reading the 
> data (see below), but encounter the following error when running 
> estimateCRDisp:
> 
>> d <- estimateCRDisp(d,design)
> Error in chol.default(crossprod(design, .vecmat(mu/(1 + dispersion * mu),  : 
>  the leading minor of order 7 is not positive definite
> 
> Applying the simple design:
> design.time <- model.matrix(~ T, data=d$sample)
> does not result in the error....
> 
> Appreciate your help!
> 
> Best regards,
> Jakob
> 
> 
> 
> 
> library(edgeR)
> r <- read.delim(file.path(datadir, 
> "RNAseq_AP-2-6-INF_liver_transcript_pivot.txt"))
> dim(r)
> #[1] 78728   205
> rdata <- r[2:205]
> #samp <- substr(colnames(r)[2:205],6,10)
> #usamp <- unique(samp)
> samp <- substr(colnames(r)[2:205],16,28)
> usamp <- unique(samp)        
> rd <- matrix(NA,nrow=nrow(r),ncol=length(usamp))
> for (i in 1:length(usamp)){
> rd[,i] <- rowSums(rdata[,grepl(usamp[i],samp)])
> }
> colnames(rd) <- usamp
> rownames(rd) <- r[,1]
> # export data
> #write.table(rd,file="RNAseq_AP-2-6-INF_liver_vs_AugSsc9_pivot.txt",sep="\t",quote=F,dec=".",row.names=T,col.names=NA)
> 
> group <- substr(unique(substr(colnames(r)[2:205],16,28)),7,13)
> #[1] "control" "control" "control" "control" "control" "control" "AP6.T06" 
> "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T06" "AP6.T12"
> #[13] "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T12" "AP6.T24" "AP6.T24" 
> "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T24" "AP6.T48"
> #[25] "AP6.T48" "AP6.T48" "AP6.T48" "AP6.T48" "AP2.T06" "AP2.T06" "AP2.T06" 
> "AP2.T06" "AP2.T06" "AP2.T12" "AP2.T12" "AP2.T12"
> #[37] "AP2.T12" "AP2.T12" "AP2.T12" "AP2.T24" "AP2.T24" "AP2.T24" "AP2.T24" 
> "AP2.T24" "AP2.T48" "AP2.T48" "AP2.T48" "AP2.T48"
> d <- DGEList(counts=rd,group=group)
> d <- calcNormFactors(d)
> d$samples$S <- factor(gsub("con","Acon",substr(d$samples$group,1,3)))
> d$samples$T <- factor(gsub("rol","T00",(substr(d$samples$group,5,7))))
> design <- model.matrix(~ S + T, data=d$sample)
> # Analysis using Cox-Reid common dispersion
> d <- estimateCRDisp(d,design)
> 
>> design
>              (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
> AP_01_control           1    0    0    0    0    0    0
> AP_02_control           1    0    0    0    0    0    0
> AP_03_control           1    0    0    0    0    0    0
> AP_28_control           1    0    0    0    0    0    0
> AP_29_control           1    0    0    0    0    0    0
> AP_30_control           1    0    0    0    0    0    0
> AP_31_AP6.T06           1    0    1    1    0    0    0
> AP_32_AP6.T06           1    0    1    1    0    0    0
> AP_33_AP6.T06           1    0    1    1    0    0    0
> AP_34_AP6.T06           1    0    1    1    0    0    0
> AP_36_AP6.T06           1    0    1    1    0    0    0
> AP_37_AP6.T12           1    0    1    0    1    0    0
> AP_38_AP6.T12           1    0    1    0    1    0    0
> AP_39_AP6.T12           1    0    1    0    1    0    0
> AP_40_AP6.T12           1    0    1    0    1    0    0
> AP_41_AP6.T12           1    0    1    0    1    0    0
> AP_42_AP6.T12           1    0    1    0    1    0    0
> AP_43_AP6.T24           1    0    1    0    0    1    0
> AP_44_AP6.T24           1    0    1    0    0    1    0
> AP_45_AP6.T24           1    0    1    0    0    1    0
> AP_46_AP6.T24           1    0    1    0    0    1    0
> AP_47_AP6.T24           1    0    1    0    0    1    0
> AP_48_AP6.T24           1    0    1    0    0    1    0
> AP_49_AP6.T48           1    0    1    0    0    0    1
> AP_50_AP6.T48           1    0    1    0    0    0    1
> AP_51_AP6.T48           1    0    1    0    0    0    1
> AP_53_AP6.T48           1    0    1    0    0    0    1
> AP_54_AP6.T48           1    0    1    0    0    0    1
> AP_55_AP2.T06           1    1    0    1    0    0    0
> AP_56_AP2.T06           1    1    0    1    0    0    0
> AP_57_AP2.T06           1    1    0    1    0    0    0
> AP_58_AP2.T06           1    1    0    1    0    0    0
> AP_59_AP2.T06           1    1    0    1    0    0    0
> AP_60_AP2.T12           1    1    0    0    1    0    0
> AP_62_AP2.T12           1    1    0    0    1    0    0
> AP_63_AP2.T12           1    1    0    0    1    0    0
> AP_64_AP2.T12           1    1    0    0    1    0    0
> AP_65_AP2.T12           1    1    0    0    1    0    0
> AP_66_AP2.T12           1    1    0    0    1    0    0
> AP_67_AP2.T24           1    1    0    0    0    1    0
> AP_69_AP2.T24           1    1    0    0    0    1    0
> AP_70_AP2.T24           1    1    0    0    0    1    0
> AP_71_AP2.T24           1    1    0    0    0    1    0
> AP_72_AP2.T24           1    1    0    0    0    1    0
> AP_73_AP2.T48           1    1    0    0    0    0    1
> AP_75_AP2.T48           1    1    0    0    0    0    1
> AP_76_AP2.T48           1    1    0    0    0    0    1
> AP_78_AP2.T48           1    1    0    0    0    0    1
> attr(,"assign")
> [1] 0 1 1 2 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$S
> [1] "contr.treatment"
> 
> attr(,"contrasts")$T
> [1] "contr.treatment"
> 
> 
> -----Oprindelig meddelelse-----
> Fra: Mark Robinson [mailto:[email protected]] 
> Sendt: 18. november 2010 05:51
> Til: Jakob Hedegaard
> Cc: [email protected]
> Emne: Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?
> 
> Hi Jakob.
> 
> Basically, you can analyze your time course data with edgeR through glmFit() 
> as you have already done, but by choosing an appropriate design matrix that 
> will allow you to test your contrasts of interest.
> 
> For example, you might define a couple factors (please double check that this 
> actually matches your $samples element of the DGEList object):
> 
> S <- factor(rep( c("Acon","AP6","AP2"), 
>                c(6,5+6+6+5,5+6+5+4)))
> T <- factor(rep( c("T00","T06","T12","T24","T48","T06","T12","T24","T48"), 
>                c(6,5,6,6,5,5,6,5,4)))
> 
> ... and then a design matrix:
> 
> design <- model.matrix( ~ S + T )
> 
> So, this design matrix will look like this:
> 
> 
>> design <- model.matrix( ~ S + T )
>> design
>  (Intercept) SAP2 SAP6 TT06 TT12 TT24 TT48
> 1            1    0    0    0    0    0    0
> 2            1    0    0    0    0    0    0
> 3            1    0    0    0    0    0    0
> 4            1    0    0    0    0    0    0
> 5            1    0    0    0    0    0    0
> 6            1    0    0    0    0    0    0
> 7            1    0    1    1    0    0    0
> 8            1    0    1    1    0    0    0
> 9            1    0    1    1    0    0    0
> 10           1    0    1    1    0    0    0
> 11           1    0    1    1    0    0    0
> 12           1    0    1    0    1    0    0
> 13           1    0    1    0    1    0    0
> 14           1    0    1    0    1    0    0
> 15           1    0    1    0    1    0    0
> 16           1    0    1    0    1    0    0
> 17           1    0    1    0    1    0    0
> 18           1    0    1    0    0    1    0
> 19           1    0    1    0    0    1    0
> 20           1    0    1    0    0    1    0
> 21           1    0    1    0    0    1    0
> 22           1    0    1    0    0    1    0
> 23           1    0    1    0    0    1    0
> 24           1    0    1    0    0    0    1
> 25           1    0    1    0    0    0    1
> 26           1    0    1    0    0    0    1
> 27           1    0    1    0    0    0    1
> 28           1    0    1    0    0    0    1
> 29           1    1    0    1    0    0    0
> 30           1    1    0    1    0    0    0
> 31           1    1    0    1    0    0    0
> 32           1    1    0    1    0    0    0
> 33           1    1    0    1    0    0    0
> 34           1    1    0    0    1    0    0
> 35           1    1    0    0    1    0    0
> 36           1    1    0    0    1    0    0
> 37           1    1    0    0    1    0    0
> 38           1    1    0    0    1    0    0
> 39           1    1    0    0    1    0    0
> 40           1    1    0    0    0    1    0
> 41           1    1    0    0    0    1    0
> 42           1    1    0    0    0    1    0
> 43           1    1    0    0    0    1    0
> 44           1    1    0    0    0    1    0
> 45           1    1    0    0    0    0    1
> 46           1    1    0    0    0    0    1
> 47           1    1    0    0    0    0    1
> 48           1    1    0    0    0    0    1
> attr(,"assign")
> [1] 0 1 1 2 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$S
> [1] "contr.treatment"
> 
> attr(,"contrasts")$T
> [1] "contr.treatment"
> 
> 
> Note that I've intentionally changed "con" to "Acon" in order to make it the 
> control group ... so, the intercept term above represents the abundance 
> parameter for the control/T00 samples.  Then, you can select your comparison 
> of interest through the 'coef' argument of the glmLRT() function.  Note that 
> you can select multiple columns.
> 
> By judicious choice of the design matrix, you should be able to fit any 
> contrast of interest. Currently you will probably need to redefine the design 
> matrix for a different reference sample to obtain different contrasts.  We 
> will be adding a 'contrasts' argument to glmLRT() shortly, which will make it 
> much easier to investigate different contrasts of interest.
> 
> Hope that helps.
> 
> Cheers,
> Mark
> 
> 
> On 2010-11-18, at 1:32 AM, Jakob Hedegaard wrote:
> 
>> Hi,
>> 
>> I wonder how to analyze a RNA-Seq dataset of a time-series experiment.
>> 
>> The dataset origin from a challenge experiment with 4-6 samples per time 
>> point (T00,T06,T12,T24 and T48) from two series of challenge (with one of 
>> two different serotypes). In total 48 individual samples and two factors, 
>> time and serotype (see the table below for details)
>> RNA-Seq profiles have been obtained using Illumina GAIIx with multiplexing.
>> 
>> I have used edgeR (the Cox-Reid and GLM method) to obtain the genes 
>> significantly affected to each time point relative to time zero (e.g. 
>> T12-T00) - thus initially ignoring the potential difference between the 
>> serotypes.
>> 
>> But how can I obtain the genes significantly affected across the different 
>> contrastes?
>> 
>> 
>>       group lib.size norm.factors serotype time
>> AP_01 control  3734226    1.0575860      con  T00
>> AP_02 control  4528260    1.0581673      con  T00
>> AP_03 control  3648163    1.0594602      con  T00
>> AP_28 control  4671178    1.0430147      con  T00
>> AP_29 control  3746020    1.0528085      con  T00
>> AP_30 control  4471915    1.1277386      con  T00
>> AP_31 AP6.T06  7384334    0.9187757      AP6  T06
>> AP_32 AP6.T06  3649621    0.9035999      AP6  T06
>> AP_33 AP6.T06  5644324    0.8929802      AP6  T06
>> AP_34 AP6.T06  3791540    0.9396600      AP6  T06
>> AP_36 AP6.T06  3922243    0.8524249      AP6  T06
>> AP_37 AP6.T12  3113854    1.0491183      AP6  T12
>> AP_38 AP6.T12  2153867    1.0996506      AP6  T12
>> AP_39 AP6.T12  2979503    1.0905274      AP6  T12
>> AP_40 AP6.T12  5375493    1.0420513      AP6  T12
>> AP_41 AP6.T12  3769654    0.9094147      AP6  T12
>> AP_42 AP6.T12  2621303    1.1272673      AP6  T12
>> AP_43 AP6.T24  3537847    1.0037276      AP6  T24
>> AP_44 AP6.T24  3660552    1.0757808      AP6  T24
>> AP_45 AP6.T24  3284038    1.0701603      AP6  T24
>> AP_46 AP6.T24  3250374    1.0230341      AP6  T24
>> AP_47 AP6.T24  7208387    0.9535068      AP6  T24
>> AP_48 AP6.T24  4169075    0.9730747      AP6  T24
>> AP_49 AP6.T48  5989902    0.9825794      AP6  T48
>> AP_50 AP6.T48  3529028    0.9471979      AP6  T48
>> AP_51 AP6.T48  5104071    1.0772029      AP6  T48
>> AP_53 AP6.T48  4823387    0.9710606      AP6  T48
>> AP_54 AP6.T48  3788201    1.0976409      AP6  T48
>> AP_55 AP2.T06  4919578    0.7872065      AP2  T06
>> AP_56 AP2.T06  4580068    0.9533078      AP2  T06
>> AP_57 AP2.T06  3908180    0.9193207      AP2  T06
>> AP_58 AP2.T06  3466801    0.9887874      AP2  T06
>> AP_59 AP2.T06  4267998    0.8769085      AP2  T06
>> AP_60 AP2.T12  4533905    0.9058305      AP2  T12
>> AP_62 AP2.T12  5906089    0.9352388      AP2  T12
>> AP_63 AP2.T12  3676318    1.1260072      AP2  T12
>> AP_64 AP2.T12  2206081    1.0579246      AP2  T12
>> AP_65 AP2.T12  3955338    1.0221930      AP2  T12
>> AP_66 AP2.T12  3775918    0.9300664      AP2  T12
>> AP_67 AP2.T24  3853681    0.9659259      AP2  T24
>> AP_69 AP2.T24  3761829    0.9592286      AP2  T24
>> AP_70 AP2.T24  4263273    1.0742397      AP2  T24
>> AP_71 AP2.T24  4736798    0.9864702      AP2  T24
>> AP_72 AP2.T24  6387114    1.0462401      AP2  T24
>> AP_73 AP2.T48  3389351    1.0303610      AP2  T48
>> AP_75 AP2.T48  1489023    1.1863821      AP2  T48
>> AP_76 AP2.T48  3588175    1.0250639      AP2  T48
>> AP_78 AP2.T48  3848562    0.9855582      AP2  T48
>> 
>> 
>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>> 
>> locale:
>> [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252    
>> LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C                    
>> LC_TIME=Danish_Denmark.1252    
>> 
>> attached base packages:
>> [1] grDevices datasets  splines   graphics  stats     tcltk     utils     
>> methods   base     
>> 
>> other attached packages:
>> [1] edgeR_1.8.2     svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      
>> Hmisc_3.8-3     survival_2.35-8
>> 
>> loaded via a namespace (and not attached):
>> [1] cluster_1.13.1  grid_2.12.0     lattice_0.19-13 limma_3.6.6     
>> svMisc_0.9-60   tools_2.12.0  
>> 
>> 
>> Best regards
>> 
>> Jakob Hedegaard
>> Project scientist
>> 
>> AARHUS UNIVERSITY
>> Faculty of Agricultural Sciences
>> Dept. of Genetics and Biotechnology
>> Blichers Allé 20, P.O. BOX 50
>> DK-8830 Tjele
>> Denmark
>> 
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: [email protected]
> e: [email protected]
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
> 
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:24}}

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] RNA-Seq - analysis of time series data?

Reply via email to