Hi Jerome I decided not to put these datasets up on my personal website as the data are already distributed with the package. The files I used for the analysis in the edgeR Users Guide were originally from GEO, but a search today of those sample IDs returned no results (which is a concern, but I'm no expert on GEO).
In any case, the data are distributed with the edgeR package, so the only trick is manipulating the data into a form suitable for following the analysis in the Users Guide. Example code for how to do this from a fresh R session is shown below. I actually just hacked some the code from readDGE() to do this. This should work generally, but I must point out that you have not provided the output of sessionInfo() or told us what versions of R/Bioconductor/edgeR you are using, so any advice that anyone (including me) provides cannot be as precise as it might be if we had all of the information. For instance, I don't know where any of those .txt files are on your system, so can't really diagnose why readDGE didn't work for you in this circumstance. readDGE() relies on in-built functions like read.delim(), so it might be worth boning up on how such R functions work to help you trouble-shoot such problems with importing data. Best wishes Davis Example code ================== library(edgeR) data(NC1, NC2, Tu102, Tu98) x <- list() sets <- c("NC1","NC2","Tu102","Tu98") x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE) x$samples$group <- factor(rep(c("Normal","Tumour"),each=2)) d <- taglist <- list() d[[1]] <- NC1 d[[2]] <- NC2 d[[3]] <- Tu102 d[[4]] <- Tu98 for(i in 1:4) { taglist[[i]] <- as.character(d[[i]][,1]) if(any(duplicated(taglist[[i]]))) { stop(paste("Repeated tag sequences in",fn)) } } tags <- unique(unlist(taglist)) ntags <- length(tags) nfiles <- length(sets) x$counts <- matrix(0,ntags,nfiles) rownames(x$counts) <- tags colnames(x$counts) <- sets for (i in 1:nfiles) { aa <- match(taglist[[i]],tags) x$counts[aa,i] <- d[[i]][,2] } x$samples$lib.size <- colSums(x$counts) x$samples$norm.factors <- 1 row.names(x$samples) <- colnames(x$counts) x$genes <- NULL d <- new("DGEList",x) d <- calcNormFactors(d) d$samples =================== On 12 August 2011 04:17, Jérôme Laroche <jerome.laro...@ibis.ulaval.ca> wrote: > Hi, > > I try to replicate the analysis "Case study of SAGE data" presented on page 9 > of edgeR document. I wonder if the mentioned datasets of Zhang et al. 1997 > are available somewhere? The datasets are: GSM728.txt, GSM729.txt, > GSM755.txt, GSM756.txt and particularly Targets.txt. > I looked at the page http://sites.google.com/site/davismcc/useful-documents, > but they do not seem to be there. > > I tried to work with the files that accompany the package (NC1.txt, NC2.txt, > and Tu98.txt Tu102.txt) but I get an error message when I run the command: >> d <- calcNormFactors (d) > (Error in calcNormFactors (d) 'data matrix' Need to Be a matrix). > > All the files are in the form: > > Tag_Sequence Count > AAAAAAAAAA 17 > AAAAAAAAGA 1 > AAAAAAACCC 1 > AAAAAAAGCA 1 > AAAAAAATCA 4 > > and the Targets.txt file is: > files group description > NC1.txt NC Normal colon > NC2.txt NC Normal colon > Tu98.txt Tu Primary colonrectal tumour > Tu102.txt Tu Primary colonrectal tumour > > > In fact, after running the commands: >> targets <-read.delim (file = "Targets.txt" stringsAsFactors = FALSE) >> d <- readDGE (targets, skip = 5, comment.char ="!") > I do not get a column showing the normalization factors (1 for all files) as > shown in the document. > > Also, when I run the command >> dim(d) > I get "NULL" as a result. > > > Thank you for your help. > > Jerome > Universite Laval, Quebec, Canada > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > -- --------------------------------------------------------------------------- Davis J McCarthy Research Technician Bioinformatics Division Walter and Eliza Hall Institute of Medical Research 1G Royal Parade, Parkville, Vic 3052, Australia dmccar...@wehi.edu.au http://www.wehi.edu.au _______________________________________________ Bioc-sig-sequencing mailing list Bioc-sig-sequencing@r-project.org https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing