[R] Data separated by spaces, getting data into R using field lengths
I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 8, 2009 at 12:53 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? ?read.fwf Read Fixed Width Format Files Description: Read a table of *f*ixed *w*idth *f*ormatted data into a 'data.frame'. Usage: read.fwf(file, widths, header = FALSE, sep = \t, skip = 0, row.names, col.names, n = -1, buffersize = 2000, ...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Sure, here you go structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c(This, This is, This is an, This is an example ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = structure(1:4, .Label = c(This, This is, This is an, This is an example), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) 2009/9/8 jim holtman jholt...@gmail.com: Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
I don't think you described your problem precisely. You implied that you wanted the field lengths to be (2,2,18,5,18) -- which is what you got with read.fwf -- but it looks like what you meant is something more like: field 1: first two characters field 2: characters 3-4 field 3: all alphabetic characters up to the next numeric value (not more than 18) field 4: all numeric values up to the next whitespace (not more than 5) field 5: all alphabetic characters to end of line (not more than 18) is that correct? (i.e., perhaps your field lengths were MAXIMUM lengths?) at the moment all I can think of is using read.fwf with field lengths 2,2, 41 and as.is=TRUE (to preserve the last field as character), then use some combination of gsub, grep, strsplit, paste to pull apart the last three fields ... Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Data-separated-by-spaces%2C-getting-data-into-R-using-field-lengths-tp25344686p25345083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This bears no relationship to what you were first asking. It look like you want to split the leading 4 characters into two groups of two and then split the remaining data into three parts based on numerics in the middle. Is this correct? On Tue, Sep 8, 2009 at 8:15 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Sure, here you go structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c(This, This is, This is an, This is an example ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = structure(1:4, .Label = c(This, This is, This is an, This is an example), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) 2009/9/8 jim holtman jholt...@gmail.com: Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 8:21 AM, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Your problem is the intermediate file. Why not get R to read directly from the database, using RODBC? Duncan Murdoch Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 02:53:11PM +0300, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? I am not totally sure what exaclty the expected result is. From your description I got the impression that your data file uses a mixture of separation characters and fixed-width formatting. Maybe I misinterpreted your example. Have a look at read.fwf() an if that does not solve your problem maybe explain the Structure and expected result a little further. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks for the suggestion, but I don't have an access to this database, I just got this messy file. -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:21 AM, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Your problem is the intermediate file. Why not get R to read directly from the database, using RODBC? Duncan Murdoch Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Hi what about reading each line by readLine and then split it to desired portions? x-paste(letters, collapse=) substring(x, c(1,3,5),c(2,4,15)) Regards Petr r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 03:21:53PM +0300, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. OK - now I got it. RODBC has already been sugested. If for some reason that is impossible you could try to dump the data using a proper delimiter (e.g. tab). Without a real delimiter it is certainly hard to parse the data - and it may even be impossible depending on what characters are allowed in your free-text fields. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks Petr, I tried something like this con - file(C:temppi.txt, r, blocking = FALSE) g - readLines(con) close(con) sta - c(1, 3, 5, 19) sto - c(2, 4, 18, 100) do.call(rbind, lapply(g, function(x) substring(x, sta, sto))) [,1] [,2] [,3] [,4] [1,] DF 12 This is an ex ample 1 This [2,] DF 12 This is an 12 32 This is [3,] DF 14 This is 12334 This is an [4,] DF 15 This 23 This is an example But this is not the solution I was looking for. Thanks. -L 2009/9/8 Petr PIKAL petr.pi...@precheza.cz: Hi what about reading each line by readLine and then split it to desired portions? x-paste(letters, collapse=) substring(x, c(1,3,5),c(2,4,15)) Regards Petr r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Sep 8, 2009, at 12:00 PM, Lauri Nikkinen wrote: Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. The solution that Henrique offered seems to be a complete one: read.table(textConnection(gsub(([0-9]+), ;\\1;, DF12 This is an example 1 This + DF12 This is an 1232 This is + DF14 This is 12334 This is an + DF15 This 23 This is an example + )), sep = ;) V1 V2 V3V4 V5 1 DF 12 This is an example 1This 2 DF 12 This is an 1232 This is 3 DF 14 This is 12334 This is an 4 DF 15This 23 This is an example Verus what you wanted... structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class + = factor), +V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, +1L), .Label = c(This, This is, This is an, This is an example +), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = + structure(1:4, .Label = c(This, +This is, This is an, This is an example), class = + factor)), .Names = c(V1, + V2, V3, V4, V5), class = data.frame, row.names = c(NA, + -4L)) V1 V2 V3V4 V5 1 DF 12 This is an example 1 This 2 DF 12 This is an 1232This is 3 DF 14This is 12334 This is an 4 DF 15 This23 This is an example Unless you can be any clearer ... than you have been to this hour. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.