Hi > str(read.table("test.txt", header=T)) 'data.frame': 9 obs. of 12 variables: $ snp : Factor w/ 9 levels "rs1113188","rs1113397",..: 9 5 7 8 3 4 6 1 2 $ gene : Factor w/ 1 level "TRP2": 1 1 1 1 1 1 1 1 1 $ chromosome : int 3 3 3 3 3 3 3 3 3 It can be sometimes tricky to upload files to R. I would recommend if read.delim fils try read.table which has less assumptions and try to set parameters (heade, sep, dec....) to get your file right
Regards Petr r-help-boun...@r-project.org napsal dne 14.07.2009 11:11:10: > Hi, > I have uploaded a copy of the file here: > - http://pastebin.com/fd0edfab > > the file has also been passed throught the unix command tool unexpand, but > it doesn't solve the problem. > > using head=TRUE instead of head=T has also the same effect. > > the output of print(names) is: > > print(names(ngly), quote=TRUE) > [1] "snp" "gene" > [3] "chromosome" "distance_from_gene_center" > [5] "position" "ame" > [7] "csasia" "easia" > [9] "eur" "mena" > [11] "oce" "ssafr" > [13] "X" "X.1" > [15] "X.2" > > Thank you to all the people who answered me to my mail address, but I > couldn't solve the problem yet. > > > On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholt...@gmail.com> wrote: > > > Can you send your file as an attachment since it is impossible to see > > where the separator characters are. > > > > On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco > > Dall'Olio<dalloli...@gmail.com> wrote: > > > Hi people, > > > I have a text file like this one posted: > > > > > > snp_id gene chromosome distance_from_gene_center > > > position pop1 pop2 pop3 pop4 pop5 pop6 pop7 > > > rs2129081 RAPT2 3 -129993 "upstream" 0.439009 > > > 1.169210 NA 0.233020 0.093042 NA > > > -0.902596 > > > rs1202698 RAPT2 3 -128695 "upstream" NA > > > 1.815000 NA 0.399079 1.814270 1.382950 > > > NA > > > rs1163207 RAPT2 3 -128224 "upstream" NA NA > > > NA NA NA NA NA > > > rs1834127 RAPT2 3 -128106 "upstream" NA NA > > > NA NA NA NA 2.180670 > > > rs2114211 RAPT2 3 -126738 "upstream" -0.468279 > > > -1.447620 NA 0.010616 -0.414581 NA > > > 0.550447 > > > rs2113151 RAPT2 3 -124620 "upstream" -0.897660 > > > -1.971020 NA -0.920327 -0.764658 NA > > > 0.337127 > > > rs2524130 RAPT2 3 -123029 "upstream" -0.109795 > > > -0.004646 -0.412059 1.116740 0.667567 > > > -0.924529 0.962841 > > > rs1381318 RAPT2 3 -12818 "upstream" -0.911662 > > > -1.791580 NA -0.945716 -1.239640 NA > > > 0.004876 > > > rs2113319 RAPT2 3 -122028 "upstream" -0.911662 > > > -1.738610 NA -0.945716 -1.240950 NA -0.005318 > > > > > > When I use read.delim (or any read function) on it, R skips the first > > > column, and I don' understand why. > > > > > > For example: > > > $: R > > >> data = read.delim('snp_file.txt', head=T, sep='\t') > > > > > > Now, I would expect data$snp_id to contain snp ids, and data$gene to > > contain > > > gene names; but it is not like this: > > > > > >> data$snp_id > > > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 > > > Levels: RAPT2 > > >> data$gene > > > [1] 3 3 3 3 3 3 3 3 3 > > > > > >> summary(data) > > > snp_id gene chromosome distance_from_gene_center > > > RAPT2:9 Min. :3 Min. :-129993 upstream:9 > > > 1st Qu.:3 1st Qu.:-128224 > > > Median :3 Median :-126738 > > > Mean :3 Mean :-113806 > > > 3rd Qu.:3 3rd Qu.:-123029 > > > Max. :3 Max. : -12818 > > > .... > > > > > >> data$pop7 > > > [1] NA NA NA NA NA NA NA NA NA > > > > > > > > > Notice that it did use snp_id as the header for the first column, but it > > > skips completely al the data from that column, and all the fields are > > > shifted, so the last column is filled with NA values. > > > > > > What I am doing wrong? Can it be a problem of my data files? I have tried > > to > > > modify them a bit (add new columns, etc..) but it didn't work. > > > > > > I am running R from an Ubuntu system: > > >> sessionInfo() > > > R version 2.9.1 (2009-06-26) > > > i486-pc-linux-gnu > > > > > > locale: > > > > > > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C > > > > > > attached base packages: > > > [1] stats graphics grDevices utils datasets methods base > > > > > > > > > > > > > > > -- > > > Giovanni Dall'Olio, phd student > > > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) > > > > > > My blog on bioinformatics: http://bioinfoblog.it > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem that you are trying to solve? > > > > > > -- > Giovanni Dall'Olio, phd student > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) > > My blog on bioinformatics: http://bioinfoblog.it > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.