This is an automated email from the git hooks/post-receive script. tille pushed a commit to branch master in repository r-bioc-ebseq.
commit e29ba01e3a3fc221dfa25bd60dce059b4fff66b0 Author: Andreas Tille <[email protected]> Date: Sun Nov 8 21:11:37 2015 +0100 Imported Upstream version 1.10.0 --- DESCRIPTION | 12 +-- NAMESPACE | 4 +- NEWS | 52 ++++++++++++- R/EBMultiTest.R | 16 ++-- R/EBTest.R | 11 ++- R/GetDEResults.R | 76 +++++++++++++++++++ R/MedianNorm.R | 18 ++++- R/PlotPostVsRawFC.R | 2 +- README.md | 126 ++++++++++++++++++++++++++++++++ build/vignette.rds | Bin 199 -> 203 bytes demo/EBSeq.R | 28 ++----- inst/doc/EBSeq_Vignette.R | 142 +++++++++++++++++------------------- inst/doc/EBSeq_Vignette.Rnw | 169 +++++++++++++++++++++++++++++++++---------- inst/doc/EBSeq_Vignette.pdf | Bin 958612 -> 946690 bytes man/EBMultiTest.Rd | 6 +- man/EBTest.Rd | 7 +- man/GetDEResults.Rd | 104 ++++++++++++++++++++++++++ man/MedianNorm.Rd | 3 +- vignettes/EBSeq_Vignette.Rnw | 169 +++++++++++++++++++++++++++++++++---------- 19 files changed, 740 insertions(+), 205 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index a05a01d..a80992f 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,11 +2,11 @@ Package: EBSeq Type: Package Title: An R package for gene and isoform differential expression analysis of RNA-seq data -Version: 1.6.0 -Date: 2014-9-17 +Version: 1.10.0 +Date: 2015-7-28 Author: Ning Leng, Christina Kendziorski -Maintainer: Ning Leng <[email protected]> -Depends: blockmodeling, gplots, R (>= 3.0.0) +Maintainer: Ning Leng <[email protected]> +Depends: blockmodeling, gplots, testthat, R (>= 3.0.0) Description: Differential Expression analysis at both gene and isoform level using RNA-seq data License: Artistic-2.0 @@ -17,7 +17,9 @@ Collate: 'MedianNorm.R' 'GetNg.R' 'beta.mom.R' 'f0.R' 'f1.R' 'GetPPMat.R' 'GetMultiPP.R' 'GetMultiFC.R' 'PlotPostVsRawFC.R' 'crit_fun.R' 'DenNHist.R' 'GetNormalizedMat.R' 'PlotPattern.R' 'PolyFitPlot.R' 'QQP.R' 'QuantileNorm.R' 'RankNorm.R' + 'GetDEResults.R' BuildVignettes: yes biocViews: StatisticalMethod, DifferentialExpression, MultipleComparison, RNASeq, Sequencing -Packaged: 2014-10-14 03:22:48 UTC; biocbuild +NeedsCompilation: no +Packaged: 2015-10-14 02:58:51 UTC; biocbuild diff --git a/NAMESPACE b/NAMESPACE index 9b63076..97446ce 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,5 +1,5 @@ export(crit_fun, DenNHist, EBTest, GetNg, GetPP, MedianNorm, PolyFitPlot, PostFC, QQP, QuantileNorm, RankNorm, EBMultiTest, GetMultiPP, GetPatterns, PlotPattern, GetPPMat, GetMultiFC, PlotPostVsRawFC, -GetNormalizedMat,f0,f1,LogN,LogNMulti) - +GetNormalizedMat,f0,f1,LogN,LogNMulti, GetDEResults) +import(blockmodeling, gplots, testthat) diff --git a/NEWS b/NEWS index 0424f32..20cf398 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,49 @@ + +CHANGES IN VERSION 1.9.3 +------------------------ + + + o Correct typos in GetDEResults help file. + o Include an additional method for normalization. + +CHANGES IN VERSION 1.9.2 +------------------------ + + + o Fixed a bug which may cause error when input a matrix to the sizeFactors parameter + +CHANGES IN VERSION 1.9.1 +------------------------ + + + o Added Q&A seqction in vignette to address common questions + +CHANGES IN VERSION 1.7.1 +------------------------ + + + o In EBSeq 1.7.1, EBSeq incorporates a new function + GetDEResults() which may be used to obtain a list of transcripts under a target FDR + in a two-condition experiment. + The results obtained by applying this function with its default setting will be + more robust to transcripts with low variance and potential outliers. + By using the default settings in this function, + the number of genes identified in any given analysis may + differ slightly from the previous version (1.7.0 or order). + To obtain results that are comparable + to results from earlier versions of EBSeq (1.7.0 or older), a user may set + Method="classic" in GetDEResults() function, or use the original GetPPMat() function. + The GeneDEResults() function also allows a user to modify thresholds to + target genes/isoforms with a pre-specified posterior fold change. + + o Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function + will only remove transcripts with all 0's (instead of removing transcripts with + 75th quantile less than 10 in version 1.3.3-1.7.0). + To obtain a list of transcripts comparable to the results generated by + EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10 + when applying EBTest() or EBMultiTest() function. + + CHANGES IN VERSION 1.5.4 ------------------------ @@ -34,7 +80,7 @@ NEW FEATURES low expressed genes (genes whose 75th quantile of normalized counts is less than 10) before identifying DE genes. These two thresholds can be changed in EBTest function. - We found that low expressed genes are more easily to be affected by noises. - Removing these genes prior to downstream analyses can improve the - model fitting and reduce impacts of noisy genes (e.g. genes with outliers). + Because low expressed genes are disproportionately noisy, + removing these genes prior to downstream analyses can improve model fitting and increase robustness + (e.g. by removing outliers). diff --git a/R/EBMultiTest.R b/R/EBMultiTest.R index ce2dd7e..54d6352 100644 --- a/R/EBMultiTest.R +++ b/R/EBMultiTest.R @@ -1,25 +1,23 @@ EBMultiTest <- -function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Pool=F, NumBin=1000, ApproxVal=10^-10,PoolLower=.25, PoolUpper=.75,Print=T,Qtrm=.75,QtrmCut=10) +function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Pool=F, NumBin=1000, ApproxVal=10^-10,PoolLower=.25, PoolUpper=.75,Print=T,Qtrm=1,QtrmCut=0) { + expect_is(sizeFactors, c("numeric","integer")) + expect_is(maxround, c("numeric","integer")) if(!is.factor(Conditions))Conditions=as.factor(Conditions) if(is.null(rownames(Data)))stop("Please add gene/isoform names to the data matrix") - if(!is.matrix(Data))stop("The input Data is not a matrix") + if(!is.matrix(Data))stop("The input Data is not a matrix") if(length(Conditions)!=ncol(Data))stop("The number of conditions is not the same as the number of samples! ") if(nlevels(Conditions)==2)stop("Only 2 conditions - Please use EBTest() function") if(nlevels(Conditions)<2)stop("Less than 2 conditions - Please check your input") if(length(sizeFactors)!=length(Data) & length(sizeFactors)!=ncol(Data)) stop("The number of library size factors is not the same as the number of samples!") - tau=CI=CIthre=NULL Dataraw=Data - - + #Normalized DataNorm=GetNormalizedMat(Data, sizeFactors) - - QuantileFor0=apply(DataNorm,1,function(i)quantile(i,Qtrm)) AllZeroNames=which(QuantileFor0<=QtrmCut) NotAllZeroNames=which(QuantileFor0>QtrmCut) @@ -32,7 +30,7 @@ function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Po if(!is.null(NgVector))NgVector=NgVector[NotAllZeroNames] if(is.null(NgVector))NgVector=rep(1,nrow(Data)) - + if(length(sizeFactors)!=ncol(Data))sizeFactors=sizeFactors[NotAllZeroNames,] #ReNameThem IsoNamesIn=rownames(Data) @@ -360,7 +358,7 @@ if (length(AllNA)>0){ if(length(sizeFactors)==ncol(Data)) R.NotIn=matrix(outer(R.NotIn.raw,sizeFactors),nrow=length(AllNA.Ngorder)) if(length(sizeFactors)==length(Data)) - R.NotIn=matrix(R.NotIn.raw*sizeFactors[NotIn,],nrow=length(AllNA.Ngorder)) + R.NotIn=matrix(R.NotIn.raw*sizeFactors[names(R.NotIn.raw),],nrow=length(AllNA.Ngorder)) DataListNotIn.unlistWithZ=matrix(DataList.unlist[AllNA.Ngorder,], nrow=length(AllNA.Ngorder)) diff --git a/R/EBTest.R b/R/EBTest.R index 4453909..ddd9e74 100644 --- a/R/EBTest.R +++ b/R/EBTest.R @@ -1,6 +1,8 @@ EBTest <- -function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=1000,ApproxVal=10^-10, Alpha=NULL, Beta=NULL,PInput=NULL,RInput=NULL,PoolLower=.25, PoolUpper=.75,Print=T, Qtrm=.75,QtrmCut=10) +function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=1000,ApproxVal=10^-10, Alpha=NULL, Beta=NULL,PInput=NULL,RInput=NULL,PoolLower=.25, PoolUpper=.75,Print=T, Qtrm=1,QtrmCut=0) { + expect_is(sizeFactors, c("numeric","integer")) + expect_is(maxround, c("numeric","integer")) if(!is.factor(Conditions))Conditions=as.factor(Conditions) if(is.null(rownames(Data)))stop("Please add gene/isoform names to the data matrix") @@ -17,6 +19,7 @@ function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=10 #Normalized DataNorm=GetNormalizedMat(Data, sizeFactors) + expect_is(DataNorm, "matrix") Levels=levels(as.factor(Conditions)) # Dixon Statistics @@ -41,7 +44,7 @@ function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=10 if(length(NotAllZeroNames)==0)stop("0 transcript passed") Data=Data[NotAllZeroNames,] if(!is.null(NgVector))NgVector=NgVector[NotAllZeroNames] - if(length(sizeFactors)==length(Data))sizeFactors=sizeFactors[NotAllZeroNames,] + if(length(sizeFactors)!=ncol(Data))sizeFactors=sizeFactors[NotAllZeroNames,] if(is.null(NgVector))NgVector=rep(1,nrow(Data)) #Rename Them @@ -528,7 +531,7 @@ if (length(AllNA)>0){ if(length(sizeFactors)==ncol(Data)) R.NotIn=outer(R.NotIn.raw,sizeFactors) if(length(sizeFactors)==length(Data)) - R.NotIn=R.NotIn.raw*sizeFactors[NotIn,] + R.NotIn=R.NotIn.raw*sizeFactors[names(R.NotIn.raw),] R.NotIn1=matrix(R.NotIn[,Conditions==levels(Conditions)[1]],nrow=nrow(R.NotIn)) R.NotIn2=matrix(R.NotIn[,Conditions==levels(Conditions)[2]],nrow=nrow(R.NotIn)) @@ -587,6 +590,6 @@ Result=list(Alpha=UpdateAlpha,Beta=UpdateBeta,P=UpdateP, C2EstVar=RealName.C2VarList, PoolVar=RealName.PoolVarList , DataList=RealName.DataList,PPDE=AllZ,f0=AllF0, f1=AllF1, AllZeroIndex=AllZeroNames,PPMat=PPMatNZ, PPMatWith0=PPMat, - ConditionOrder=CondOut, Conditions=Conditions) + ConditionOrder=CondOut, Conditions=Conditions, DataNorm=DataNorm) } diff --git a/R/GetDEResults.R b/R/GetDEResults.R new file mode 100644 index 0000000..f750ee7 --- /dev/null +++ b/R/GetDEResults.R @@ -0,0 +1,76 @@ +GetDEResults<-function(EBPrelim, FDR=0.05, Method="robust", + FDRMethod="hard", Threshold_FC=0.7, + Threshold_FCRatio=0.3, SmallNum=0.01) +{ + if(!"PPDE"%in%names(EBPrelim))stop("The input doesn't seem like an output from EBTest") + + ################# + Conditions = EBPrelim$Conditions + Levels = levels(as.factor(Conditions)) + PPcut=FDR + # normalized data + GeneMat=EBPrelim$DataNorm + + + ###Get DEfound by FDRMethod type + PP=GetPPMat(EBPrelim) + if(FDRMethod=="hard") + {DEfound=rownames(PP)[which(PP[,"PPDE"]>=(1-PPcut))]} + else{SoftThre=crit_fun(PP[,"PPEE"],PPcut) + DEfound=rownames(PP)[which(PP[,"PPDE"]>=SoftThre)]} + + # classic + if(Method=="classic"){ + Gene_status=rep("EE",dim(GeneMat)[1]) + names(Gene_status)=rownames(GeneMat) + Gene_status[DEfound]="DE" + NoTest_genes=rownames(GeneMat)[!(rownames(GeneMat)%in%rownames(PP))] + Gene_status[NoTest_genes]="Filtered: Low Expression" + + PPMatWith0=EBPrelim$PPMatWith0 + PPMatWith0[NoTest_genes,]=c(NA,NA) + + return(list(DEfound=DEfound,PPMat=PPMatWith0,Status=Gene_status)) + } + else{ + ###Post_Foldchange + PostFoldChange=PostFC(EBPrelim) + PPFC=PostFoldChange$PostFC + + OldPPFC=PPFC[DEfound] + OldPPFC[which(OldPPFC>1)]=1/OldPPFC[which(OldPPFC>1)] + + FilterFC=names(OldPPFC)[which(OldPPFC>Threshold_FC)] + + ###New Fold Change + NewFC1=apply(matrix(GeneMat[DEfound,which(Conditions==Levels[[1]])]+SmallNum, + nrow=length(DEfound)),1,median) + NewFC2=apply(matrix(GeneMat[DEfound,which(Conditions==Levels[[2]])]+SmallNum, + nrow=length(DEfound)),1,median) + NewFC=NewFC1/NewFC2 + NewFC[which(NewFC>1)]=1/NewFC[which(NewFC>1)] + + ###FC Ratio + FCRatio=NewFC/OldPPFC + FCRatio[which(OldPPFC<NewFC)]=1/FCRatio[which(OldPPFC<NewFC)] + + FilterFCR=names(FCRatio)[which(FCRatio<Threshold_FCRatio)] + + ###Results format Setting + Gene_status=rep("EE",dim(GeneMat)[1]) + names(Gene_status)=rownames(GeneMat) + Gene_status[DEfound]="DE" + NoTest_genes=rownames(GeneMat)[!(rownames(GeneMat)%in%rownames(PP))] + Gene_status[NoTest_genes]="Filtered: Low Expression" + ###Filtering + Filtered_DEfound=setdiff(DEfound, union(FilterFC, FilterFCR)) + + PPMatWith0=EBPrelim$PPMatWith0 + NAGenes=union(NoTest_genes,union(FilterFC, FilterFCR)) + PPMatWith0[NAGenes,]=c(NA,NA) + + Gene_status[FilterFC]="Filtered: Fold Change" + Gene_status[FilterFCR]="Filtered: Fold Change Ratio" + + return(list(DEfound=Filtered_DEfound,PPMat=PPMatWith0,Status=Gene_status)) +}} diff --git a/R/MedianNorm.R b/R/MedianNorm.R index afd7b37..caa9a36 100644 --- a/R/MedianNorm.R +++ b/R/MedianNorm.R @@ -1,5 +1,17 @@ -MedianNorm=function(Data){ +MedianNorm <- function(Data, alternative=FALSE){ if(ncol(Data)==1)stop("Only 1 sample!") - geomeans <- exp(rowMeans(log(Data))) - apply(Data, 2, function(cnts) median((cnts/geomeans)[geomeans > 0])) + if(!alternative){ + geomeans <- exp(rowMeans(log(Data))) + out <- apply(Data, 2, function(cnts) median((cnts/geomeans)[geomeans > 0]))} + if(alternative){ + DataMatO <- Data + N <- ncol(DataMatO) + DataList0 <- sapply(1:N,function(i)DataMatO[,i]/DataMatO,simplify=F) + DataEachMed0 <- sapply(1:N,function(i)apply(DataList0[[i]],2,function(j)median(j[which(j>0 + & j<Inf)]))) + DataColgeo <- sapply(1:N,function(i)exp(mean(log(DataEachMed0[-i,i])))) + out <- DataColgeo + } + + out } diff --git a/R/PlotPostVsRawFC.R b/R/PlotPostVsRawFC.R index 64ba2b3..519bf6d 100644 --- a/R/PlotPostVsRawFC.R +++ b/R/PlotPostVsRawFC.R @@ -1,5 +1,5 @@ PlotPostVsRawFC<-function(EBOut,FCOut){ -library(gplots) +#library(gplots) par(fig=c(0,.8,0,1), new=F) RainbowColor=rev(redgreen(length(FCOut$PostFC))) diff --git a/README.md b/README.md new file mode 100644 index 0000000..f876425 --- /dev/null +++ b/README.md @@ -0,0 +1,126 @@ +# EBSeq Q & A + + +## ReadIn data + +csv file: + +``` +In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T) + +Data=data.matrix(In) +``` + +txt file: +``` +In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T) + +Data=data.matrix(In) +``` +check str(Data) and make sure it is a matrix instead of data frame. You may need to play around with the row.names and header option depends on how the input file was generated. + + + +## GetDEResults() function not found + +You may on an earlier version of EBSeq. The GetDEResults function +was introduced since version 1.7.1. +The latest release version could be found at: +http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html +And you may check your package version by typing packageVersion("EBSeq") + + +## Visualizing DE genes/isoforms + +To generate a heatmap, you may consider the heatmap.2 function in gplots package. +For example, you may run +``` +heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F) +``` +The normalized matrix may be obtained from GetNormalizedMat() function. + + +## My favorite gene/isoform has NA in PP (status "NoTest") + +The NoTest status comes from two sources + +1) Using the default parameter settings of EBMultiTest(), the function +will not test on genes with more than 75% values < 10 to ensure better +model fitting. To disable this filter, you may set Qtrm=1 and +QtrmCut=0. + +2) numerical over/underflow in R. That happens when the within +condition variance is extremely large or small. I did implemented a numerical +approximation step to calculate the approximated PP for these genes +with over/underflow. Here I use 10^-10 to approximate the parameter p +in the NB distribution for these genes (I set it to a small value +since I want to cover more over/underflow genes with low +within-condition variation). You may try to tune this value (to a larger value) in the +approximation by setting ApproxVal in EBTest() or EBMultiTest() function. + +## Can I run more than 5 iterations when running EBSeq via RSEM wrapper? + +Yes you may modify the script rsem-for-ebseq-find-DE under RSEM/EBSeq +change line 36 +``` +EBOut <- EBTest(Data = DataMat, NgVector = ngvector, Conditions = +conditions, sizeFactors = Sizes, maxround = 5) +``` +to +``` +EBOut <- EBTest(Data = DataMat, NgVector = ngvector, Conditions = +conditions, sizeFactors = Sizes, maxround = 10) +``` +If you are running multiple condition analysis, you will need to change line 53: +``` +MultiOut <- EBMultiTest(Data = DataMat, NgVector = ngvector, +Conditions = conditions, AllParti = patterns, sizeFactors = Sizes, +maxround = 5) +``` +``` +MultiOut <- EBMultiTest(Data = DataMat, NgVector = ngvector, +Conditions = conditions, AllParti = patterns, sizeFactors = Sizes, +maxround = 10) +``` +You will need to redo make after you make the changes. + +## I saw a gene has significant FC but is not called as DE by EBSeq, why does that happen? + +EBSeq calls a gene as DE (assign high PPDE) if the across-condition variability is significantly larger than the within-condition +variability. In the cases that a gene has large within-condition variation, although the FC across two conditions is large (small), +the across-condition difference could still be explained by biological variation within condition. In these cases the gene/isoform +will have a moderate PPDE. + +## Can I look at TPMs/RPKMs/FPKMs across samples? + +In general, it is not appropriate to perform cross sample comparisons using TPM, FPKM or RPKM without further normalization. +Instead, you may use normalized counts (It can be generated by GetNormalizedMat() function from raw count, +note EBSeq testing functions takes raw counts and library size factors) + +Here is an example: + +Suppose there are 2 samples S1 and S2 from different conditions. Each has 5 genes. For simplicity, we assume +each of 5 genes contains only one isoform and all genes have the same length. + +Assume only gene 5 is DE and the gene expressions of these 5 genes are: + + + +|Sample|g1|g2|g3|g4|g5| +|---|---|---|---|---|---| +|S1|10|10|10|10|10| +|S2| 20 | 20 | 20 | 20 | 100 | + +Then the TPM/FPKM/RPKM will be (note sum TPM/FPKM/RPKM of all genes should be 10^6 ): + +|Sample|g1|g2|g3|g4|g5| +|---|---|---|---|---|---| +| S1 | 2x10^5 | 2x10^5 | 2x10^5 | 2x10^5 | 2x10^5 | +| S2 | 1.1x10^5| 1.1x10^5| 1.1x10^5| 1.1x10^5| 5.6x10^5| + +Based on TPM/FPKM/RPKM, an investigator may conclude that the first 4 genes are down-regulated and the 5th gene is up-regulated. +Then we will get 4 false positive calls. + +Cross-sample TPM/FPKM/RPKM comparisons will be feasible only when no hypothetical DE genes present across samples +(Or when assuming the DE genes are sort of 'symmetric' regarding up and down regulation). + diff --git a/build/vignette.rds b/build/vignette.rds index 6c7e94f..489e54e 100644 Binary files a/build/vignette.rds and b/build/vignette.rds differ diff --git a/demo/EBSeq.R b/demo/EBSeq.R index 3d5024d..8a264b4 100644 --- a/demo/EBSeq.R +++ b/demo/EBSeq.R @@ -5,11 +5,8 @@ str(GeneMat) Sizes=MedianNorm(GeneMat) EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) -PP=GetPPMat(EBOut) -str(PP) -head(PP) -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) +DEOut=GetDEResults(EBOut) +str(DEOut) #3.2 data(IsoList) str(IsoList) @@ -23,10 +20,7 @@ IsoNgTrun=NgList$IsoformNgTrun IsoNgTrun[c(1:3,201:203,601:603)] IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -str(IsoPP) -head(IsoPP) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] +IsoDE=GetDEResults(IsoEBOut) str(IsoDE) #3.3 data(MultiGeneMat) @@ -73,11 +67,7 @@ str(GeneMat) Sizes=MedianNorm(GeneMat) EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) -PP=GetPPMat(EBOut) -str(PP) -head(PP) -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) +DEOut=GetDEResults(EBOut) EBOut$Alpha EBOut$Beta EBOut$P @@ -102,9 +92,7 @@ IsoNgTrun=NgList$IsoformNgTrun IsoNgTrun[c(1:3,201:203,601:603)] IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -str(IsoPP) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] +IsoDE=GetDEResults(IsoEBOut) str(IsoDE) IsoEBOut$Alpha IsoEBOut$Beta @@ -194,8 +182,7 @@ GeneMat.norep=GeneMat[,c(1,6)] Sizes.norep=MedianNorm(GeneMat.norep) EBOut.norep=EBTest(Data=GeneMat.norep, Conditions=as.factor(rep(c("C1","C2"))),sizeFactors=Sizes.norep, maxround=5) -PP.norep=GetPPMat(EBOut.norep) -DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)] +DE.norep=GetDEResults(EBOut.norep) GeneFC.norep=PostFC(EBOut.norep) @@ -210,8 +197,7 @@ IsoMat.norep=IsoMat[,c(1,6)] IsoSizes.norep=MedianNorm(IsoMat.norep) IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun, Conditions=as.factor(c("C1","C2")),sizeFactors=IsoSizes.norep, maxround=5) -IsoPP.norep=GetPPMat(IsoEBOut.norep) -IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)] +IsoDE.norep=GetDEResults(IsoEBOut.norep) IsoFC.norep=PostFC(IsoEBOut.norep) diff --git a/inst/doc/EBSeq_Vignette.R b/inst/doc/EBSeq_Vignette.R index 1595b03..372d896 100644 --- a/inst/doc/EBSeq_Vignette.R +++ b/inst/doc/EBSeq_Vignette.R @@ -27,22 +27,16 @@ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) ################################################### -### code chunk number 5: EBSeq_Vignette.Rnw:241-244 +### code chunk number 5: EBSeq_Vignette.Rnw:240-244 ################################################### -PP=GetPPMat(EBOut) -str(PP) -head(PP) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) ################################################### -### code chunk number 6: EBSeq_Vignette.Rnw:249-251 -################################################### -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) - - -################################################### -### code chunk number 7: EBSeq_Vignette.Rnw:285-291 +### code chunk number 6: EBSeq_Vignette.Rnw:289-295 ################################################### data(IsoList) str(IsoList) @@ -53,13 +47,13 @@ IsosGeneNames=IsoList$IsosGeneNames ################################################### -### code chunk number 8: EBSeq_Vignette.Rnw:298-299 +### code chunk number 7: EBSeq_Vignette.Rnw:302-303 ################################################### IsoSizes=MedianNorm(IsoMat) ################################################### -### code chunk number 9: EBSeq_Vignette.Rnw:320-323 +### code chunk number 8: EBSeq_Vignette.Rnw:324-327 ################################################### NgList=GetNg(IsoNames, IsosGeneNames) IsoNgTrun=NgList$IsoformNgTrun @@ -67,26 +61,25 @@ IsoNgTrun[c(1:3,201:203,601:603)] ################################################### -### code chunk number 10: EBSeq_Vignette.Rnw:335-342 +### code chunk number 9: EBSeq_Vignette.Rnw:339-345 ################################################### IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -str(IsoPP) -head(IsoPP) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] -str(IsoDE) +IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) +str(IsoEBDERes$DEfound) +head(IsoEBDERes$PPMat) +str(IsoEBDERes$Status) ################################################### -### code chunk number 11: EBSeq_Vignette.Rnw:353-355 +### code chunk number 10: EBSeq_Vignette.Rnw:368-370 ################################################### data(MultiGeneMat) str(MultiGeneMat) ################################################### -### code chunk number 12: EBSeq_Vignette.Rnw:363-366 +### code chunk number 11: EBSeq_Vignette.Rnw:378-381 ################################################### Conditions=c("C1","C1","C2","C2","C3","C3") PosParti=GetPatterns(Conditions) @@ -94,14 +87,14 @@ PosParti ################################################### -### code chunk number 13: EBSeq_Vignette.Rnw:374-376 +### code chunk number 12: EBSeq_Vignette.Rnw:389-391 ################################################### Parti=PosParti[-3,] Parti ################################################### -### code chunk number 14: EBSeq_Vignette.Rnw:381-384 +### code chunk number 13: EBSeq_Vignette.Rnw:396-399 ################################################### MultiSize=MedianNorm(MultiGeneMat) MultiOut=EBMultiTest(MultiGeneMat,NgVector=NULL,Conditions=Conditions, @@ -109,7 +102,7 @@ AllParti=Parti, sizeFactors=MultiSize, maxround=5) ################################################### -### code chunk number 15: EBSeq_Vignette.Rnw:388-393 +### code chunk number 14: EBSeq_Vignette.Rnw:403-408 ################################################### MultiPP=GetMultiPP(MultiOut) names(MultiPP) @@ -119,7 +112,7 @@ MultiPP$Patterns ################################################### -### code chunk number 16: EBSeq_Vignette.Rnw:412-420 +### code chunk number 15: EBSeq_Vignette.Rnw:427-435 ################################################### data(IsoMultiList) IsoMultiMat=IsoMultiList[[1]] @@ -132,21 +125,21 @@ Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4") ################################################### -### code chunk number 17: EBSeq_Vignette.Rnw:426-428 +### code chunk number 16: EBSeq_Vignette.Rnw:441-443 ################################################### PosParti.4Cond=GetPatterns(Conditions) PosParti.4Cond ################################################### -### code chunk number 18: EBSeq_Vignette.Rnw:433-435 +### code chunk number 17: EBSeq_Vignette.Rnw:448-450 ################################################### Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),] Parti.4Cond ################################################### -### code chunk number 19: EBSeq_Vignette.Rnw:440-444 +### code chunk number 18: EBSeq_Vignette.Rnw:455-459 ################################################### IsoMultiOut=EBMultiTest(IsoMultiMat, NgVector=IsoNgTrun.Multi,Conditions=Conditions, @@ -155,7 +148,7 @@ maxround=5) ################################################### -### code chunk number 20: EBSeq_Vignette.Rnw:448-453 +### code chunk number 19: EBSeq_Vignette.Rnw:463-468 ################################################### IsoMultiPP=GetMultiPP(IsoMultiOut) names(MultiPP) @@ -165,26 +158,26 @@ IsoMultiPP$Patterns ################################################### -### code chunk number 21: EBSeq_Vignette.Rnw:470-475 (eval = FALSE) +### code chunk number 20: EBSeq_Vignette.Rnw:485-490 (eval = FALSE) ################################################### ## data(GeneMat) ## Sizes=MedianNorm(GeneMat) ## EBOut=EBTest(Data=GeneMat, ## Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) -## PP=GetPPMat(EBOut) +## EBDERes=GetDEResults(EBOut, FDR=0.05) ################################################### -### code chunk number 22: EBSeq_Vignette.Rnw:477-481 +### code chunk number 21: EBSeq_Vignette.Rnw:492-496 ################################################### -str(PP) -head(PP) -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) ################################################### -### code chunk number 23: EBSeq_Vignette.Rnw:491-494 +### code chunk number 22: EBSeq_Vignette.Rnw:506-509 ################################################### GeneFC=PostFC(EBOut) str(GeneFC) @@ -192,7 +185,7 @@ PlotPostVsRawFC(EBOut,GeneFC) ################################################### -### code chunk number 24: EBSeq_Vignette.Rnw:515-518 +### code chunk number 23: EBSeq_Vignette.Rnw:530-533 ################################################### EBOut$Alpha EBOut$Beta @@ -200,21 +193,21 @@ EBOut$P ################################################### -### code chunk number 25: EBSeq_Vignette.Rnw:537-539 +### code chunk number 24: EBSeq_Vignette.Rnw:552-554 ################################################### par(mfrow=c(1,2)) QQP(EBOut) ################################################### -### code chunk number 26: EBSeq_Vignette.Rnw:555-557 +### code chunk number 25: EBSeq_Vignette.Rnw:570-572 ################################################### par(mfrow=c(1,2)) DenNHist(EBOut) ################################################### -### code chunk number 27: EBSeq_Vignette.Rnw:578-583 (eval = FALSE) +### code chunk number 26: EBSeq_Vignette.Rnw:593-598 (eval = FALSE) ################################################### ## data(IsoList) ## IsoMat=IsoList$IsoMat @@ -224,7 +217,7 @@ DenNHist(EBOut) ################################################### -### code chunk number 28: EBSeq_Vignette.Rnw:585-588 +### code chunk number 27: EBSeq_Vignette.Rnw:600-603 ################################################### names(NgList) IsoNgTrun=NgList$IsoformNgTrun @@ -232,36 +225,35 @@ IsoNgTrun[c(1:3,201:203,601:603)] ################################################### -### code chunk number 29: EBSeq_Vignette.Rnw:619-620 (eval = FALSE) +### code chunk number 28: EBSeq_Vignette.Rnw:634-635 (eval = FALSE) ################################################### ## IsoNgTrun = scan(file="output_name.ngvec", what=0, sep="\n") ################################################### -### code chunk number 30: EBSeq_Vignette.Rnw:633-638 (eval = FALSE) +### code chunk number 29: EBSeq_Vignette.Rnw:648-652 (eval = FALSE) ################################################### ## IsoSizes=MedianNorm(IsoMat) ## IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, ## Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -## IsoPP=GetPPMat(IsoEBOut) -## IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] +## IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) ################################################### -### code chunk number 31: EBSeq_Vignette.Rnw:640-641 +### code chunk number 30: EBSeq_Vignette.Rnw:654-655 ################################################### -str(IsoDE) +str(IsoEBDERes) ################################################### -### code chunk number 32: EBSeq_Vignette.Rnw:646-648 +### code chunk number 31: EBSeq_Vignette.Rnw:660-662 ################################################### IsoFC=PostFC(IsoEBOut) str(IsoFC) ################################################### -### code chunk number 33: EBSeq_Vignette.Rnw:659-662 +### code chunk number 32: EBSeq_Vignette.Rnw:673-676 ################################################### IsoEBOut$Alpha IsoEBOut$Beta @@ -269,7 +261,7 @@ IsoEBOut$P ################################################### -### code chunk number 34: EBSeq_Vignette.Rnw:681-686 +### code chunk number 33: EBSeq_Vignette.Rnw:695-700 ################################################### par(mfrow=c(2,2)) PolyFitValue=vector("list",3) @@ -279,7 +271,7 @@ for(i in 1:3) ################################################### -### code chunk number 35: EBSeq_Vignette.Rnw:699-708 +### code chunk number 34: EBSeq_Vignette.Rnw:713-722 ################################################### PolyAll=PolyFitPlot(unlist(IsoEBOut$C1Mean), unlist(IsoEBOut$C1EstVar),5) lines(log10(IsoEBOut$C1Mean[[1]][PolyFitValue[[1]]$sort]), @@ -293,21 +285,21 @@ col=c("red","yellow","pink","green"),lty=1,lwd=3,box.lwd=2) ################################################### -### code chunk number 36: EBSeq_Vignette.Rnw:721-723 +### code chunk number 35: EBSeq_Vignette.Rnw:735-737 ################################################### par(mfrow=c(2,3)) QQP(IsoEBOut) ################################################### -### code chunk number 37: EBSeq_Vignette.Rnw:735-737 +### code chunk number 36: EBSeq_Vignette.Rnw:749-751 ################################################### par(mfrow=c(2,3)) DenNHist(IsoEBOut) ################################################### -### code chunk number 38: EBSeq_Vignette.Rnw:754-758 +### code chunk number 37: EBSeq_Vignette.Rnw:768-772 ################################################### Conditions=c("C1","C1","C2","C2","C3","C3") PosParti=GetPatterns(Conditions) @@ -316,14 +308,14 @@ PlotPattern(PosParti) ################################################### -### code chunk number 39: EBSeq_Vignette.Rnw:765-767 +### code chunk number 38: EBSeq_Vignette.Rnw:779-781 ################################################### Parti=PosParti[-3,] Parti ################################################### -### code chunk number 40: EBSeq_Vignette.Rnw:773-779 (eval = FALSE) +### code chunk number 39: EBSeq_Vignette.Rnw:787-793 (eval = FALSE) ################################################### ## data(MultiGeneMat) ## MultiSize=MedianNorm(MultiGeneMat) @@ -334,7 +326,7 @@ Parti ################################################### -### code chunk number 41: EBSeq_Vignette.Rnw:783-788 +### code chunk number 40: EBSeq_Vignette.Rnw:797-802 ################################################### MultiPP=GetMultiPP(MultiOut) names(MultiPP) @@ -344,28 +336,28 @@ MultiPP$Patterns ################################################### -### code chunk number 42: EBSeq_Vignette.Rnw:795-797 +### code chunk number 41: EBSeq_Vignette.Rnw:809-811 ################################################### MultiFC=GetMultiFC(MultiOut) str(MultiFC) ################################################### -### code chunk number 43: EBSeq_Vignette.Rnw:806-808 +### code chunk number 42: EBSeq_Vignette.Rnw:820-822 ################################################### par(mfrow=c(2,2)) QQP(MultiOut) ################################################### -### code chunk number 44: EBSeq_Vignette.Rnw:816-818 +### code chunk number 43: EBSeq_Vignette.Rnw:830-832 ################################################### par(mfrow=c(2,2)) DenNHist(MultiOut) ################################################### -### code chunk number 45: EBSeq_Vignette.Rnw:833-836 +### code chunk number 44: EBSeq_Vignette.Rnw:847-850 ################################################### Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4") PosParti.4Cond=GetPatterns(Conditions) @@ -373,7 +365,7 @@ PosParti.4Cond ################################################### -### code chunk number 46: EBSeq_Vignette.Rnw:841-844 +### code chunk number 45: EBSeq_Vignette.Rnw:855-858 ################################################### PlotPattern(PosParti.4Cond) Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),] @@ -381,7 +373,7 @@ Parti.4Cond ################################################### -### code chunk number 47: EBSeq_Vignette.Rnw:851-862 (eval = FALSE) +### code chunk number 46: EBSeq_Vignette.Rnw:865-876 (eval = FALSE) ################################################### ## data(IsoMultiList) ## IsoMultiMat=IsoMultiList[[1]] @@ -397,7 +389,7 @@ Parti.4Cond ################################################### -### code chunk number 48: EBSeq_Vignette.Rnw:864-869 +### code chunk number 47: EBSeq_Vignette.Rnw:878-883 ################################################### names(MultiPP) IsoMultiPP$PP[1:10,] @@ -407,7 +399,7 @@ IsoMultiFC=GetMultiFC(IsoMultiOut) ################################################### -### code chunk number 49: EBSeq_Vignette.Rnw:880-883 +### code chunk number 48: EBSeq_Vignette.Rnw:894-897 ################################################### par(mfrow=c(3,4)) QQP(IsoMultiOut) @@ -415,14 +407,14 @@ QQP(IsoMultiOut) ################################################### -### code chunk number 50: EBSeq_Vignette.Rnw:893-895 +### code chunk number 49: EBSeq_Vignette.Rnw:907-909 ################################################### par(mfrow=c(3,4)) DenNHist(IsoMultiOut) ################################################### -### code chunk number 51: EBSeq_Vignette.Rnw:927-936 +### code chunk number 50: EBSeq_Vignette.Rnw:941-949 ################################################### data(GeneMat) GeneMat.norep=GeneMat[,c(1,6)] @@ -430,13 +422,12 @@ Sizes.norep=MedianNorm(GeneMat.norep) EBOut.norep=EBTest(Data=GeneMat.norep, Conditions=as.factor(rep(c("C1","C2"))), sizeFactors=Sizes.norep, maxround=5) -PP.norep=GetPPMat(EBOut.norep) -DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)] +EBDERes.norep=GetDEResults(EBOut.norep) GeneFC.norep=PostFC(EBOut.norep) ################################################### -### code chunk number 52: EBSeq_Vignette.Rnw:946-960 +### code chunk number 51: EBSeq_Vignette.Rnw:959-972 ################################################### data(IsoList) IsoMat=IsoList$IsoMat @@ -449,13 +440,12 @@ IsoSizes.norep=MedianNorm(IsoMat.norep) IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun, Conditions=as.factor(c("C1","C2")), sizeFactors=IsoSizes.norep, maxround=5) -IsoPP.norep=GetPPMat(IsoEBOut.norep) -IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)] +IsoEBDERes.norep=GetDEResults(IsoEBOut.norep) IsoFC.norep=PostFC(IsoEBOut.norep) ################################################### -### code chunk number 53: EBSeq_Vignette.Rnw:969-981 +### code chunk number 52: EBSeq_Vignette.Rnw:981-993 ################################################### data(MultiGeneMat) MultiGeneMat.norep=MultiGeneMat[,c(1,3,5)] @@ -472,7 +462,7 @@ MultiFC.norep=GetMultiFC(MultiOut.norep) ################################################### -### code chunk number 54: EBSeq_Vignette.Rnw:993-1012 +### code chunk number 53: EBSeq_Vignette.Rnw:1005-1024 ################################################### data(IsoMultiList) IsoMultiMat=IsoMultiList[[1]] diff --git a/inst/doc/EBSeq_Vignette.Rnw b/inst/doc/EBSeq_Vignette.Rnw index b384e0c..bed7960 100644 --- a/inst/doc/EBSeq_Vignette.Rnw +++ b/inst/doc/EBSeq_Vignette.Rnw @@ -1,4 +1,4 @@ -%\VignetteIndexEntry{EBSeq} +%\VignetteIndexEntry{EBSeq Vignette} \documentclass{article} \usepackage{fullpage} @@ -236,21 +236,25 @@ Please note this may take several minutes: EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) @ -\noindent The posterior probabilities of being DE are obtained as follows, where \verb+PP+ is a matrix containing the posterior probabilities of -being EE or DE for each of the 1,000 simulated genes: +\noindent The list of DE genes and the posterior probabilities of being DE are obtained as follows <<>>= -PP=GetPPMat(EBOut) -str(PP) -head(PP) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) @ -\noindent The matrix \verb+PP+ contains two columns \verb+PPEE+ and \verb+PPDE+, +\noindent \verb+EBDERes$DEfound+ is a list of genes identified with 5\% FDR. EBSeq found +95 genes. The matrix \verb+EBDERes$PPMat+ contains two columns \verb+PPEE+ and \verb+PPDE+, corresponding to the posterior probabilities of being EE or DE for each gene. -\verb+PP+ may be used to form an FDR-controlled list of DE genes with a target FDR of 0.05 as follows: -<<>>= -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) -@ -\noindent EBSeq found 98 DE genes in total with target FDR 0.05. +\verb+EBDERes$Status+ contains each gene's status called by EBSeq. + +\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1. +By using the default settings, the number of genes identified in any given analysis may +differ slightly from the previous version. The updated algorithm is more robust to outliers +and transcripts with low variance. To obtain results that are comparable +to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set +\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function. + \subsection{Isoform level DE analysis (two conditions)} \label{sec:startisode} @@ -335,13 +339,24 @@ required in practice (see Section \ref{sec:detailedisodeconverge} for details). <<>>= IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -str(IsoPP) -head(IsoPP) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] -str(IsoDE) +IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) +str(IsoEBDERes$DEfound) +head(IsoEBDERes$PPMat) +str(IsoEBDERes$Status) @ -\noindent We see that EBSeq found 105 DE isoforms at the target FDR of 0.05. +\noindent We see that EBSeq found 104 DE isoforms at the target FDR of 0.05. + +\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1. +By using the default settings, the number of transcripts identified in any given analysis may +differ slightly from the previous version. The updated algorithm is more robust to outliers +and transcripts with low variance. To obtain results that are comparable +to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set +\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function. + + + + + \subsection{Gene level DE analysis (more than two conditions)} \label{sec:startmulticond} @@ -472,15 +487,15 @@ data(GeneMat) Sizes=MedianNorm(GeneMat) EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) -PP=GetPPMat(EBOut) +EBDERes=GetDEResults(EBOut, FDR=0.05) @ <<>>= -str(PP) -head(PP) -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) @ -\noindent EBSeq found 98 DE genes at a target FDR of 0.05.\\ +\noindent EBSeq found 95 DE genes at a target FDR of 0.05.\\ \subsubsection{Calculating FC} \label{sec:detailedgenedefc} @@ -634,13 +649,12 @@ EBSeq can be applied as described in Section \ref{sec:startisoderun}. IsoSizes=MedianNorm(IsoMat) IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] +IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) @ <<>>= -str(IsoDE) +str(IsoEBDERes) @ -\noindent We see that EBSeq found 105 DE isoforms at a target FDR of 0.05. +\noindent We see that EBSeq found 104 DE isoforms at a target FDR of 0.05. The function \verb+PostFC+ could also be used here to calculate the Fold Change (FC) as well as the posterior FC on the normalization factor adjusted data. <<>>= @@ -922,7 +936,7 @@ This approach works well when there are no more than 50\% DE genes in the data s To generate a data set with no replicates, we take the first sample of each condition. For example, using the data from Section \ref{sec:detailedgenede}, we take sample 1 from condition 1 and -sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetPPMat+ and +sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetDEResults+ and \verb+PostFC+ may be used on data without replicates. <<>>= data(GeneMat) @@ -931,8 +945,7 @@ Sizes.norep=MedianNorm(GeneMat.norep) EBOut.norep=EBTest(Data=GeneMat.norep, Conditions=as.factor(rep(c("C1","C2"))), sizeFactors=Sizes.norep, maxround=5) -PP.norep=GetPPMat(EBOut.norep) -DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)] +EBDERes.norep=GetDEResults(EBOut.norep) GeneFC.norep=PostFC(EBOut.norep) @ @@ -955,8 +968,7 @@ IsoSizes.norep=MedianNorm(IsoMat.norep) IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun, Conditions=as.factor(c("C1","C2")), sizeFactors=IsoSizes.norep, maxround=5) -IsoPP.norep=GetPPMat(IsoEBOut.norep) -IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)] +IsoEBDERes.norep=GetDEResults(IsoEBOut.norep) IsoFC.norep=PostFC(IsoEBOut.norep) @ @@ -1040,12 +1052,95 @@ proofreading the vignette. low expressed genes (genes whose 75th quantile of normalized counts is less than 10) before identifying DE genes. These two thresholds can be changed in EBTest function. -We found that low expressed genes are more easily to be affected by noises. -Removing these genes prior to downstream analyses can improve the -model fitting and reduce impacts of noisy genes (e.g. genes with outliers). +Because low expressed genes are disproportionately noisy, +removing these genes prior to downstream analyses can improve model fitting and increase robustness +(e.g. by removing outliers). 2014-5-22: In EBSeq 1.5.2, numerical approximations are implemented to deal with underflow. The underflow is likely due to large number of samples. + +2015-1-29: In EBSeq 1.7.1, EBSeq incorporates a new function +GetDEResults() which may be used to obtain a list of transcripts under a target FDR in a two-condition experiment. +The results obtained by applying this function with its default setting will be +more robust to transcripts with low variance and potential outliers. +By using the default settings in this function, +the number of genes identified in any given analysis may +differ slightly from the previous version (1.7.0 or order). +To obtain results that are comparable +to results from earlier versions of EBSeq (1.7.0 or older), a user may set +Method="classic" in GetDEResults() function, or use the original GetPPMat() function. +The GeneDEResults() function also allows a user to modify thresholds to +target genes/isoforms with a pre-specified posterior fold change. + +Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function +will only remove transcripts with all 0's (instead of removing transcripts with +75th quantile less than 10 in version 1.3.3-1.7.0). +To obtain a list of transcripts comparable to the results generated by EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10 +when applying EBTest() or EBMultiTest() function. + +\section{Common Q and A} +\subsection{Read in data} + +csv file: + +\verb+In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T)+ + +\verb+Data=data.matrix(In)+ + +\noindent txt file: + +\verb+In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T)+ + +\verb+Data=data.matrix(In)+ + +\noindent Check \verb+str(Data)+ and make sure it is a matrix instead of data frame. You may need to play around with the \verb+row.names+ +and \verb+header+ option depends on how the input file was generated. + + + +\subsection{GetDEResults() function not found} + +You may on an earlier version of EBSeq. The GetDEResults function +was introduced since version 1.7.1. +The latest release version could be found at: + +\url{http://www.bioconductor.org/packages/release/bioc/html/EBSeq.html} + +\noindent The latest devel version: + +\url{http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html} + +\noindent And you may check your package version by typing \verb+packageVersion("EBSeq")+. + + +\subsection{Visualizing DE genes/isoforms} + +To generate a heatmap, you may consider the heatmap.2 function in gplots package. +For example, you may run + +\verb+heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F)+ + +The normalized matrix may be obtained from \verb+GetNormalizedMat()+ function. + + +\subsection{My favorite gene/isoform has NA in PP (status "NoTest")} + +\indent The NoTest status comes from two sources: + +1) In version 1.3.3-1.7.0, using the default parameter settings of EBMultiTest(), the function +will not test on genes with more than 75\% values $\le$ 10 to ensure better +model fitting. To disable this filter, you may set Qtrm=1 and +QtrmCut=0. + +2) numerical over/underflow in R. That happens when the within +condition variance is extremely large or small. we did implemented a numerical +approximation step to calculate the approximated PP for these genes +with over/underflow. Here we use $10^{-10}$ to approximate the parameter p +in the NB distribution for these genes (we set it to a small value +since we want to cover more over/underflow genes with low +within-condition variation). You may try to tune this value (to a larger value) in the +approximation by setting \verb+ApproxVal+ in \verb+EBTest()+ or \verb+EBMultiTest()+ function. + \pagebreak \bibliographystyle{plain} \bibliography{lengetal} diff --git a/inst/doc/EBSeq_Vignette.pdf b/inst/doc/EBSeq_Vignette.pdf index 106d468..2ba03da 100644 Binary files a/inst/doc/EBSeq_Vignette.pdf and b/inst/doc/EBSeq_Vignette.pdf differ diff --git a/man/EBMultiTest.Rd b/man/EBMultiTest.Rd index 28a5c34..7b93b9d 100644 --- a/man/EBMultiTest.Rd +++ b/man/EBMultiTest.Rd @@ -10,7 +10,7 @@ of interested patterns in a multiple condition study \usage{ EBMultiTest(Data, NgVector = NULL, Conditions, AllParti = NULL, sizeFactors, maxround, Pool = F, NumBin = 1000, - ApproxVal=10^-10, PoolLower=.25, PoolUpper = .75, Print=T,Qtrm=.75,QtrmCut=10) + ApproxVal=10^-10, PoolLower=.25, PoolUpper = .75, Print=T,Qtrm=1,QtrmCut=0) } \arguments{ @@ -45,8 +45,8 @@ We use the cross condition variance estimations for the candidate genes and the \item{Print}{Whether print the elapsed-time while running the test.} \item{Qtrm, QtrmCut}{ -Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 0.75 and QtrmCut=10. -By default setting, transcripts that have >75\% of the samples with expression less than 10 +Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 1 and QtrmCut=0. +By default setting, transcripts with all 0's won't be tested. } } diff --git a/man/EBTest.Rd b/man/EBTest.Rd index 97e4d39..28f23f4 100644 --- a/man/EBTest.Rd +++ b/man/EBTest.Rd @@ -11,7 +11,7 @@ Base on the assumption of NB-Beta Empirical Bayes model, the EM algorithm is use EBTest(Data, NgVector = NULL, Conditions, sizeFactors, maxround, Pool = F, NumBin = 1000, ApproxVal = 10^-10, Alpha = NULL, Beta = NULL, PInput = NULL, RInput = NULL, - PoolLower = .25, PoolUpper = .75, Print = T, Qtrm = .75,QtrmCut=10) + PoolLower = .25, PoolUpper = .75, Print = T, Qtrm = 1,QtrmCut=0) } \arguments{ @@ -36,8 +36,8 @@ We use the cross condition variance estimations for the candidate genes and the \item{Alpha, Beta, PInput, RInput}{If the parameters are known and the user doesn't want to estimate them from the data, user could specify them here.} \item{Print}{Whether print the elapsed-time while running the test.} \item{Qtrm, QtrmCut}{ -Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 0.75 and QtrmCut=10. -By default setting, transcripts that have >75\% of the samples with expression less than 10 +Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 1 and QtrmCut=0. +By default setting, transcripts with all 0's won't be tested. } } @@ -77,6 +77,7 @@ Transcripts with expression 0 for all samples are shown as PP(EE) = PP(DE) = NA The transcript order is exactly the same as the order of the input data.} \item{ConditionOrder}{The condition assignment for C1Mean, C2Mean, etc.} \item{Conditions}{The input conditions.} +\item{DataNorm}{Normalized expression matrix.} } \references{ Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M.G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013) diff --git a/man/GetDEResults.Rd b/man/GetDEResults.Rd new file mode 100644 index 0000000..34c51f8 --- /dev/null +++ b/man/GetDEResults.Rd @@ -0,0 +1,104 @@ +\name{GetDEResults} +\alias{GetDEResults} +\title{ +Obtain Differential Expression Analysis Results in a Two-condition Test +} +\description{ +Obtain DE analysis results in a two-condition test using the output of EBTest() +} +\usage{ +GetDEResults(EBPrelim, FDR=0.05, Method="robust", + FDRMethod="hard", Threshold_FC=0.7, + Threshold_FCRatio=0.3, SmallNum=0.01) +} +\arguments{ + \item{EBPrelim}{Output from the function EBTest().} + \item{FDR}{Target FDR, defaut is 0.05.} + \item{FDRMethod}{"hard" or "soft". + Giving a target FDR alpha, either hard threshold and soft + threshold may be used. If the hard threshold is preferred, DE transcripts are + defined as the the transcripts with PP(DE) greater than + (1-alpha). Using the hard threshold, any DE transcript in the list + has FDR <= alpha. + + If the soft threshold is preferred, the DE transcripts are defined as the + transcripts with PP(DE) greater than crit_fun(PPEE, alpha). Using + the soft threshold, the list of DE transcripts has average FDR + alpha. + + Based on results from our simulation studies, hard thresholds provide a better-controlled + empirical FDR when sample size is relatively small(Less than 10 samples in each condition). + User may consider the soft threshold when sample size is large to improve power.} + \item{Method}{"robust" or "classic". + Using the "robust" option, EBSeq is more robust to genes with outliers and + genes with extremely small variances. + Using the "classic" option, the results will be more comparable to those obtained + by using the GetPPMat() function from earlier version (<= 1.7.0) of EBSeq. + Default is "robust".} + \item{Threshold_FC}{Threshold for the fold change (FC) statistics. + The default is 0.7. The FC statistics are calculated as follows. + First the posterior FC estimates are calculated using PostFC() function. + The FC statistics is defined as exp(-|log posterior FC|) and therefore is always less than + or equal to 1. + The default threshold was selected as the optimal threshold learned from our simulation studies. By setting the + threshold as 0.7, the expected FC for a DE transcript is less than 0.7 + (or greater than 1/0.7=1.4). + User may specify their own threshold here. A higher (less conservative) threshold + may be used here when sample size is large. Our simulation results + indicated that when there are more than or equal to 5 samples in each condition, + a less conservative threshold will improve the power when the FDR is still well-controlled. + The parameter will be ignored if Method is set as "classic".} + \item{Threshold_FCRatio}{Threshold for the fold change ratio (FCRatio) statistics. + The default is 0.3. The FCRatio statistics are calculated as follows. + First we get another revised fold change + statistic called Median-FC statistic for each transcript. + For each transcript, we calculate the median of + normalized expression values within each condition. + The MedianFC is defined as exp(-|log((C1Median+SmallNum)/(C2Median+SmallNum))|). + Note a small number is added to avoid Inf and NA. See SmallNum for more details. + The FCRatio is calculated as exp(-|log(FCstatistics/MedianFC)|). + Therefore it is always less than or equal to 1. + The default threshold was selected as the optimal threshold learned from our simulation studies. + By setting the threshold as 0.3, the FCRatio for a DE transcript is + expected to be larger than 0.3. + } + \item{SmallNum}{When calculating the FCRatio (or Median-FC), a small number is added for each transcript in each + condition to avoid Inf and NA. Default is 0.01.} +} +\details{ +GetDEResults() function takes output from EBTest() function and output a list of +DE transcripts under a target FDR. It also provides posterior probability estimates for each +transcript. +} +\value{ + \item{DEfound}{A list of DE transcripts.} + \item{PPMat}{Posterior probability matrix. Transcripts are following the same order as + in the input matrix. + Transcripts that were filtered by magnitude (in EBTest function), FC, or FCR + are assigned with NA for both PPDE and PPEE.} + \item{Status}{Each transcript will be assigned with one of the following + values: "DE", "EE", "Filtered: Low Expression", + "Filtered: Fold Change" and "Filtered: Fold Change Ratio". + Transcripts are following the same order as in the input matrix.} +} +\references{ +Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M.G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013) +} +\author{ +Ning Leng, Yuan Li +} +\seealso{ +EBTest +} +\examples{ +data(GeneMat) +str(GeneMat) +GeneMat.small = GeneMat[c(1:10,511:550),] +Sizes = MedianNorm(GeneMat.small) +EBOut = EBTest(Data = GeneMat.small, + Conditions = as.factor(rep(c("C1","C2"), each = 5)), + sizeFactors = Sizes, maxround = 5) +Out = GetDEResults(EBOut) +} +\keyword{ DE } +\keyword{ Two condition } diff --git a/man/MedianNorm.Rd b/man/MedianNorm.Rd index 9fc47ff..afc1475 100644 --- a/man/MedianNorm.Rd +++ b/man/MedianNorm.Rd @@ -7,10 +7,11 @@ Median Normalization 'MedianNorm' specifies the median normalization function from Anders et. al., 2010. } \usage{ -MedianNorm(Data) +MedianNorm(Data, alternative = FALSE) } \arguments{ \item{Data}{The data matrix with transcripts in rows and lanes in columns.} + \item{alternative}{if alternative = TRUE, the alternative version of median normalization will be applied.} } \value{The function will return a vector contains the normalization factor for each lane.} diff --git a/vignettes/EBSeq_Vignette.Rnw b/vignettes/EBSeq_Vignette.Rnw index b384e0c..bed7960 100644 --- a/vignettes/EBSeq_Vignette.Rnw +++ b/vignettes/EBSeq_Vignette.Rnw @@ -1,4 +1,4 @@ -%\VignetteIndexEntry{EBSeq} +%\VignetteIndexEntry{EBSeq Vignette} \documentclass{article} \usepackage{fullpage} @@ -236,21 +236,25 @@ Please note this may take several minutes: EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) @ -\noindent The posterior probabilities of being DE are obtained as follows, where \verb+PP+ is a matrix containing the posterior probabilities of -being EE or DE for each of the 1,000 simulated genes: +\noindent The list of DE genes and the posterior probabilities of being DE are obtained as follows <<>>= -PP=GetPPMat(EBOut) -str(PP) -head(PP) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) @ -\noindent The matrix \verb+PP+ contains two columns \verb+PPEE+ and \verb+PPDE+, +\noindent \verb+EBDERes$DEfound+ is a list of genes identified with 5\% FDR. EBSeq found +95 genes. The matrix \verb+EBDERes$PPMat+ contains two columns \verb+PPEE+ and \verb+PPDE+, corresponding to the posterior probabilities of being EE or DE for each gene. -\verb+PP+ may be used to form an FDR-controlled list of DE genes with a target FDR of 0.05 as follows: -<<>>= -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) -@ -\noindent EBSeq found 98 DE genes in total with target FDR 0.05. +\verb+EBDERes$Status+ contains each gene's status called by EBSeq. + +\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1. +By using the default settings, the number of genes identified in any given analysis may +differ slightly from the previous version. The updated algorithm is more robust to outliers +and transcripts with low variance. To obtain results that are comparable +to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set +\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function. + \subsection{Isoform level DE analysis (two conditions)} \label{sec:startisode} @@ -335,13 +339,24 @@ required in practice (see Section \ref{sec:detailedisodeconverge} for details). <<>>= IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -str(IsoPP) -head(IsoPP) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] -str(IsoDE) +IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) +str(IsoEBDERes$DEfound) +head(IsoEBDERes$PPMat) +str(IsoEBDERes$Status) @ -\noindent We see that EBSeq found 105 DE isoforms at the target FDR of 0.05. +\noindent We see that EBSeq found 104 DE isoforms at the target FDR of 0.05. + +\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1. +By using the default settings, the number of transcripts identified in any given analysis may +differ slightly from the previous version. The updated algorithm is more robust to outliers +and transcripts with low variance. To obtain results that are comparable +to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set +\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function. + + + + + \subsection{Gene level DE analysis (more than two conditions)} \label{sec:startmulticond} @@ -472,15 +487,15 @@ data(GeneMat) Sizes=MedianNorm(GeneMat) EBOut=EBTest(Data=GeneMat, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5) -PP=GetPPMat(EBOut) +EBDERes=GetDEResults(EBOut, FDR=0.05) @ <<>>= -str(PP) -head(PP) -DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)] -str(DEfound) +EBDERes=GetDEResults(EBOut, FDR=0.05) +str(EBDERes$DEfound) +head(EBDERes$PPMat) +str(EBDERes$Status) @ -\noindent EBSeq found 98 DE genes at a target FDR of 0.05.\\ +\noindent EBSeq found 95 DE genes at a target FDR of 0.05.\\ \subsubsection{Calculating FC} \label{sec:detailedgenedefc} @@ -634,13 +649,12 @@ EBSeq can be applied as described in Section \ref{sec:startisoderun}. IsoSizes=MedianNorm(IsoMat) IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun, Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5) -IsoPP=GetPPMat(IsoEBOut) -IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)] +IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05) @ <<>>= -str(IsoDE) +str(IsoEBDERes) @ -\noindent We see that EBSeq found 105 DE isoforms at a target FDR of 0.05. +\noindent We see that EBSeq found 104 DE isoforms at a target FDR of 0.05. The function \verb+PostFC+ could also be used here to calculate the Fold Change (FC) as well as the posterior FC on the normalization factor adjusted data. <<>>= @@ -922,7 +936,7 @@ This approach works well when there are no more than 50\% DE genes in the data s To generate a data set with no replicates, we take the first sample of each condition. For example, using the data from Section \ref{sec:detailedgenede}, we take sample 1 from condition 1 and -sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetPPMat+ and +sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetDEResults+ and \verb+PostFC+ may be used on data without replicates. <<>>= data(GeneMat) @@ -931,8 +945,7 @@ Sizes.norep=MedianNorm(GeneMat.norep) EBOut.norep=EBTest(Data=GeneMat.norep, Conditions=as.factor(rep(c("C1","C2"))), sizeFactors=Sizes.norep, maxround=5) -PP.norep=GetPPMat(EBOut.norep) -DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)] +EBDERes.norep=GetDEResults(EBOut.norep) GeneFC.norep=PostFC(EBOut.norep) @ @@ -955,8 +968,7 @@ IsoSizes.norep=MedianNorm(IsoMat.norep) IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun, Conditions=as.factor(c("C1","C2")), sizeFactors=IsoSizes.norep, maxround=5) -IsoPP.norep=GetPPMat(IsoEBOut.norep) -IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)] +IsoEBDERes.norep=GetDEResults(IsoEBOut.norep) IsoFC.norep=PostFC(IsoEBOut.norep) @ @@ -1040,12 +1052,95 @@ proofreading the vignette. low expressed genes (genes whose 75th quantile of normalized counts is less than 10) before identifying DE genes. These two thresholds can be changed in EBTest function. -We found that low expressed genes are more easily to be affected by noises. -Removing these genes prior to downstream analyses can improve the -model fitting and reduce impacts of noisy genes (e.g. genes with outliers). +Because low expressed genes are disproportionately noisy, +removing these genes prior to downstream analyses can improve model fitting and increase robustness +(e.g. by removing outliers). 2014-5-22: In EBSeq 1.5.2, numerical approximations are implemented to deal with underflow. The underflow is likely due to large number of samples. + +2015-1-29: In EBSeq 1.7.1, EBSeq incorporates a new function +GetDEResults() which may be used to obtain a list of transcripts under a target FDR in a two-condition experiment. +The results obtained by applying this function with its default setting will be +more robust to transcripts with low variance and potential outliers. +By using the default settings in this function, +the number of genes identified in any given analysis may +differ slightly from the previous version (1.7.0 or order). +To obtain results that are comparable +to results from earlier versions of EBSeq (1.7.0 or older), a user may set +Method="classic" in GetDEResults() function, or use the original GetPPMat() function. +The GeneDEResults() function also allows a user to modify thresholds to +target genes/isoforms with a pre-specified posterior fold change. + +Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function +will only remove transcripts with all 0's (instead of removing transcripts with +75th quantile less than 10 in version 1.3.3-1.7.0). +To obtain a list of transcripts comparable to the results generated by EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10 +when applying EBTest() or EBMultiTest() function. + +\section{Common Q and A} +\subsection{Read in data} + +csv file: + +\verb+In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T)+ + +\verb+Data=data.matrix(In)+ + +\noindent txt file: + +\verb+In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T)+ + +\verb+Data=data.matrix(In)+ + +\noindent Check \verb+str(Data)+ and make sure it is a matrix instead of data frame. You may need to play around with the \verb+row.names+ +and \verb+header+ option depends on how the input file was generated. + + + +\subsection{GetDEResults() function not found} + +You may on an earlier version of EBSeq. The GetDEResults function +was introduced since version 1.7.1. +The latest release version could be found at: + +\url{http://www.bioconductor.org/packages/release/bioc/html/EBSeq.html} + +\noindent The latest devel version: + +\url{http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html} + +\noindent And you may check your package version by typing \verb+packageVersion("EBSeq")+. + + +\subsection{Visualizing DE genes/isoforms} + +To generate a heatmap, you may consider the heatmap.2 function in gplots package. +For example, you may run + +\verb+heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F)+ + +The normalized matrix may be obtained from \verb+GetNormalizedMat()+ function. + + +\subsection{My favorite gene/isoform has NA in PP (status "NoTest")} + +\indent The NoTest status comes from two sources: + +1) In version 1.3.3-1.7.0, using the default parameter settings of EBMultiTest(), the function +will not test on genes with more than 75\% values $\le$ 10 to ensure better +model fitting. To disable this filter, you may set Qtrm=1 and +QtrmCut=0. + +2) numerical over/underflow in R. That happens when the within +condition variance is extremely large or small. we did implemented a numerical +approximation step to calculate the approximated PP for these genes +with over/underflow. Here we use $10^{-10}$ to approximate the parameter p +in the NB distribution for these genes (we set it to a small value +since we want to cover more over/underflow genes with low +within-condition variation). You may try to tune this value (to a larger value) in the +approximation by setting \verb+ApproxVal+ in \verb+EBTest()+ or \verb+EBMultiTest()+ function. + \pagebreak \bibliographystyle{plain} \bibliography{lengetal} -- Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-ebseq.git _______________________________________________ debian-med-commit mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
