Hi, I have uploaded a copy of the file here: - http://pastebin.com/fd0edfab
the file has also been passed throught the unix command tool unexpand, but it doesn't solve the problem. using head=TRUE instead of head=T has also the same effect. the output of print(names) is: > print(names(ngly), quote=TRUE) [1] "snp" "gene" [3] "chromosome" "distance_from_gene_center" [5] "position" "ame" [7] "csasia" "easia" [9] "eur" "mena" [11] "oce" "ssafr" [13] "X" "X.1" [15] "X.2" Thank you to all the people who answered me to my mail address, but I couldn't solve the problem yet. On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholt...@gmail.com> wrote: > Can you send your file as an attachment since it is impossible to see > where the separator characters are. > > On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco > Dall'Olio<dalloli...@gmail.com> wrote: > > Hi people, > > I have a text file like this one posted: > > > > snp_id gene chromosome distance_from_gene_center > > position pop1 pop2 pop3 pop4 pop5 pop6 pop7 > > rs2129081 RAPT2 3 -129993 "upstream" 0.439009 > > 1.169210 NA 0.233020 0.093042 NA > > -0.902596 > > rs1202698 RAPT2 3 -128695 "upstream" NA > > 1.815000 NA 0.399079 1.814270 1.382950 > > NA > > rs1163207 RAPT2 3 -128224 "upstream" NA NA > > NA NA NA NA NA > > rs1834127 RAPT2 3 -128106 "upstream" NA NA > > NA NA NA NA 2.180670 > > rs2114211 RAPT2 3 -126738 "upstream" -0.468279 > > -1.447620 NA 0.010616 -0.414581 NA > > 0.550447 > > rs2113151 RAPT2 3 -124620 "upstream" -0.897660 > > -1.971020 NA -0.920327 -0.764658 NA > > 0.337127 > > rs2524130 RAPT2 3 -123029 "upstream" -0.109795 > > -0.004646 -0.412059 1.116740 0.667567 > > -0.924529 0.962841 > > rs1381318 RAPT2 3 -12818 "upstream" -0.911662 > > -1.791580 NA -0.945716 -1.239640 NA > > 0.004876 > > rs2113319 RAPT2 3 -122028 "upstream" -0.911662 > > -1.738610 NA -0.945716 -1.240950 NA -0.005318 > > > > When I use read.delim (or any read function) on it, R skips the first > > column, and I don' understand why. > > > > For example: > > $: R > >> data = read.delim('snp_file.txt', head=T, sep='\t') > > > > Now, I would expect data$snp_id to contain snp ids, and data$gene to > contain > > gene names; but it is not like this: > > > >> data$snp_id > > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 > > Levels: RAPT2 > >> data$gene > > [1] 3 3 3 3 3 3 3 3 3 > > > >> summary(data) > > snp_id gene chromosome distance_from_gene_center > > RAPT2:9 Min. :3 Min. :-129993 upstream:9 > > 1st Qu.:3 1st Qu.:-128224 > > Median :3 Median :-126738 > > Mean :3 Mean :-113806 > > 3rd Qu.:3 3rd Qu.:-123029 > > Max. :3 Max. : -12818 > > .... > > > >> data$pop7 > > [1] NA NA NA NA NA NA NA NA NA > > > > > > Notice that it did use snp_id as the header for the first column, but it > > skips completely al the data from that column, and all the fields are > > shifted, so the last column is filled with NA values. > > > > What I am doing wrong? Can it be a problem of my data files? I have tried > to > > modify them a bit (add new columns, etc..) but it didn't work. > > > > I am running R from an Ubuntu system: > >> sessionInfo() > > R version 2.9.1 (2009-06-26) > > i486-pc-linux-gnu > > > > locale: > > > LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > > > > > > > -- > > Giovanni Dall'Olio, phd student > > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) > > > > My blog on bioinformatics: http://bioinfoblog.it > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.