Re: [R] read.delim skips first column (why?)

Giovanni Marco Dall'Olio Tue, 14 Jul 2009 02:14:20 -0700

Hi,
I have uploaded a copy of the file here:
- http://pastebin.com/fd0edfab


the file has also been passed throught the unix command tool unexpand, but
it doesn't solve the problem.

using head=TRUE instead of head=T has also the same effect.

the output of print(names) is:
> print(names(ngly), quote=TRUE)
 [1] "snp"                       "gene"
 [3] "chromosome"                "distance_from_gene_center"
 [5] "position"                  "ame"
 [7] "csasia"                    "easia"
 [9] "eur"                       "mena"
[11] "oce"                       "ssafr"
[13] "X"                         "X.1"
[15] "X.2"

Thank you to all the people who answered me to my mail address, but I
couldn't solve the problem yet.


On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholt...@gmail.com> wrote:

> Can you send your file as an attachment since it is impossible to see
> where the separator characters are.
>
> On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
> Dall'Olio<dalloli...@gmail.com> wrote:
> > Hi people,
> > I have a text file like this one posted:
> >
> > snp_id  gene    chromosome      distance_from_gene_center
> > position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
> > rs2129081       RAPT2   3       -129993 "upstream"      0.439009
> > 1.169210        NA      0.233020        0.093042        NA
> > -0.902596
> > rs1202698       RAPT2   3       -128695 "upstream"      NA
> > 1.815000        NA      0.399079        1.814270        1.382950
> > NA
> > rs1163207       RAPT2   3       -128224 "upstream"      NA      NA
> > NA      NA      NA      NA      NA
> > rs1834127       RAPT2   3       -128106 "upstream"      NA      NA
> > NA      NA      NA      NA      2.180670
> > rs2114211       RAPT2   3       -126738 "upstream"      -0.468279
> > -1.447620       NA      0.010616        -0.414581       NA
> > 0.550447
> > rs2113151       RAPT2   3       -124620 "upstream"      -0.897660
> > -1.971020       NA      -0.920327       -0.764658       NA
> > 0.337127
> > rs2524130       RAPT2   3       -123029 "upstream"      -0.109795
> > -0.004646       -0.412059       1.116740        0.667567
> > -0.924529       0.962841
> > rs1381318       RAPT2   3       -12818  "upstream"      -0.911662
> > -1.791580       NA      -0.945716       -1.239640       NA
> > 0.004876
> > rs2113319       RAPT2   3       -122028 "upstream"      -0.911662
> > -1.738610       NA      -0.945716       -1.240950       NA      -0.005318
> >
> > When I use read.delim (or any read function) on it, R skips the first
> > column, and I don' understand why.
> >
> > For example:
> > $: R
> >> data = read.delim('snp_file.txt', head=T, sep='\t')
> >
> > Now, I would expect data$snp_id to contain snp ids, and data$gene to
> contain
> > gene names; but it is not like this:
> >
> >> data$snp_id
> > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
> > Levels: RAPT2
> >> data$gene
> > [1] 3 3 3 3 3 3 3 3 3
> >
> >> summary(data)
> >  snp_id       gene     chromosome      distance_from_gene_center
> >  RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
> >           1st Qu.:3   1st Qu.:-128224
> >           Median :3   Median :-126738
> >           Mean   :3   Mean   :-113806
> >           3rd Qu.:3   3rd Qu.:-123029
> >           Max.   :3   Max.   : -12818
> > ....
> >
> >> data$pop7
> > [1] NA NA NA NA NA NA NA NA NA
> >
> >
> > Notice that it did use snp_id as the header for the first column, but it
> > skips completely al the data from that column, and all the fields are
> > shifted, so the last column is filled with NA values.
> >
> > What I am doing wrong? Can it be a problem of my data files? I have tried
> to
> > modify them a bit (add new columns, etc..) but it didn't work.
> >
> > I am running R from an Ubuntu system:
> >> sessionInfo()
> > R version 2.9.1 (2009-06-26)
> > i486-pc-linux-gnu
> >
> > locale:
> >
> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> >
> > --
> > Giovanni Dall'Olio, phd student
> > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
> >
> > My blog on bioinformatics: http://bioinfoblog.it
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.delim skips first column (why?)

Reply via email to