[R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
I have a text file similar to this (separated by spaces):

x - DF12 This is an example 1 This
DF12 This is an 1232 This is
DF14 This is 12334 This is an
DF15 This 23 This is an example


and I know the field lengths of each variable (there is 5 variables in
this data set), which are:

varlength - c(2, 2, 18, 5, 18)

How can I import this kind of data into R, using the varlength
variable as an field separator indicator?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Barry Rowlingson
On Tue, Sep 8, 2009 at 12:53 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:
 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

?read.fwf

Read Fixed Width Format Files

Description:

 Read a table of *f*ixed *w*idth *f*ormatted data into a
 'data.frame'.

Usage:

 read.fwf(file, widths, header = FALSE, sep = \t,
  skip = 0, row.names, col.names, n = -1,
  buffersize = 2000, ...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Duncan Murdoch

On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

I have a text file similar to this (separated by spaces):

x - DF12 This is an example 1 This
DF12 This is an 1232 This is
DF14 This is 12334 This is an
DF15 This 23 This is an example


and I know the field lengths of each variable (there is 5 variables in
this data set), which are:

varlength - c(2, 2, 18, 5, 18)

How can I import this kind of data into R, using the varlength
variable as an field separator indicator?


See ?read.fwf.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)
  V1 V2 V3V4   V5
1 DF 12  This is an exampl e 1 T  his
2 DF 12  This is an 1232 T his is
3 DF 14  This is 12334 Thi s is   an
4 DF 15  This 23 This is a n exa mple

Which is not the way I want it.

structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
= factor),
V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
1L), .Label = c( This 23 This is a,  This is 12334 Thi,
 This is an 1232 T,  This is an exampl), class = factor),
V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
n exa, s is ), class = factor), V5 = structure(c(2L,
4L, 1L, 3L), .Label = c(an , his, mple, s), class =
factor)), .Names = c(V1,
V2, V3, V4, V5), class = data.frame, row.names = c(NA,
-4L))

Any ideas?
-L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread jim holtman
Can you post how you would like it.

On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:
 Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)
  V1 V2                 V3    V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his i    s
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple

 Which is not the way I want it.

 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
    1L), .Label = c( This 23 This is a,  This is 12334 Thi,
     This is an 1232 T,  This is an exampl), class = factor),
    V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
    n exa, s is ), class = factor), V5 = structure(c(2L,
    4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))

 Any ideas?
 -L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Duncan Murdoch

On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:

Thanks, I tried it but I got


varlength - c(2, 2, 18, 5, 18)
read.fwf(c:temppi.txt, widths=varlength)

  V1 V2 V3V4   V5
1 DF 12  This is an exampl e 1 T  his
2 DF 12  This is an 1232 T his is
3 DF 14  This is 12334 Thi s is   an
4 DF 15  This 23 This is a n exa mple

Which is not the way I want it.  


It looks as though that's because you don't have fixed width data.   
This is an example is 19 chars, including the leading space.  You told 
R it was 18.   This is an  is only 12 characters.


I would say you have two fixed width fields, and three varying fields, 
with no delimiters.  If the middle one of the three always contains 
digits and the others don't, you can probably extract them using sub(), 
but you can't use any of the read.* functions to do this:  your format 
is too strange.


Duncan Murdoch



structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
= factor),
V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
1L), .Label = c( This 23 This is a,  This is 12334 Thi,
 This is an 1232 T,  This is an exampl), class = factor),
V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
n exa, s is ), class = factor), V5 = structure(c(2L,
4L, 1L, 3L), .Label = c(an , his, mple, s), class =
factor)), .Names = c(V1,
V2, V3, V4, V5), class = data.frame, row.names = c(NA,
-4L))

Any ideas?
-L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:

On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:


I have a text file similar to this (separated by spaces):

x - DF12 This is an example 1 This
DF12 This is an 1232 This is
DF14 This is 12334 This is an
DF15 This 23 This is an example


and I know the field lengths of each variable (there is 5 variables in
this data set), which are:

varlength - c(2, 2, 18, 5, 18)

How can I import this kind of data into R, using the varlength
variable as an field separator indicator?


See ?read.fwf.

Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
Sure, here you go

structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
= factor),
V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
1L), .Label = c(This, This is, This is an, This is an example
), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 =
structure(1:4, .Label = c(This,
This is, This is an, This is an example), class =
factor)), .Names = c(V1,
V2, V3, V4, V5), class = data.frame, row.names = c(NA,
-4L))


2009/9/8 jim holtman jholt...@gmail.com:
 Can you post how you would like it.

 On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:
 Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)
  V1 V2                 V3    V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his i    s
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple

 Which is not the way I want it.

 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
    1L), .Label = c( This 23 This is a,  This is 12334 Thi,
     This is an 1232 T,  This is an exampl), class = factor),
    V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
    n exa, s is ), class = factor), V5 = structure(c(2L,
    4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))

 Any ideas?
 -L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Ben Bolker


  I don't think you described your problem precisely.
You implied that you wanted the field lengths to be
(2,2,18,5,18) -- which is what you got with read.fwf -- 
but it looks like what you meant is something more like:

field 1: first two characters
field 2: characters 3-4
field 3: all alphabetic characters up to the next numeric value
   (not more than 18)
field 4: all numeric values up to the next whitespace
   (not more than 5)
field 5: all alphabetic characters to end of line 
   (not more than 18)

  is that correct?  (i.e., perhaps your field lengths
were MAXIMUM lengths?)

  at the moment all I can think of is using read.fwf
with field lengths 2,2, 41 and as.is=TRUE (to preserve
the last field as character), then use some combination
of gsub, grep, strsplit, paste to pull apart the last three fields ...


Lauri Nikkinen wrote:
 
 Thanks, I tried it but I got
 
 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)
   V1 V2 V3V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his is
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple
 
 Which is not the way I want it.
 
 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
 V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
 1L), .Label = c( This 23 This is a,  This is 12334 Thi,
  This is an 1232 T,  This is an exampl), class = factor),
 V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
 n exa, s is ), class = factor), V5 = structure(c(2L,
 4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))
 
 Any ideas?
 -L
 
 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Data-separated-by-spaces%2C-getting-data-into-R-using-field-lengths-tp25344686p25345083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
This data is from database and the maximum length of a field is
defined. I mean that every column has a maximum length and I want to
use this maximum length as a separator. So if one cell in that
column is shorter than the maximum, cell should be padded with white
spaces or something like that. This seems to be hard to explain.

Regards,
L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:

 Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)

  V1 V2                 V3    V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his i    s
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple

 Which is not the way I want it.

 It looks as though that's because you don't have fixed width data.   This
 is an example is 19 chars, including the leading space.  You told R it was
 18.   This is an  is only 12 characters.

 I would say you have two fixed width fields, and three varying fields, with
 no delimiters.  If the middle one of the three always contains digits and
 the others don't, you can probably extract them using sub(), but you can't
 use any of the read.* functions to do this:  your format is too strange.

 Duncan Murdoch


 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
    1L), .Label = c( This 23 This is a,  This is 12334 Thi,
     This is an 1232 T,  This is an exampl), class = factor),
    V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
    n exa, s is ), class = factor), V5 = structure(c(2L,
    4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))

 Any ideas?
 -L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:

 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread jim holtman
This bears no relationship to what you were first asking.  It look
like you want to split the leading 4 characters into two groups of two
and then split the remaining data into three parts based on numerics
in the middle.  Is this correct?

On Tue, Sep 8, 2009 at 8:15 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:
 Sure, here you go

 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
    1L), .Label = c(This, This is, This is an, This is an example
    ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 =
 structure(1:4, .Label = c(This,
    This is, This is an, This is an example), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))


 2009/9/8 jim holtman jholt...@gmail.com:
 Can you post how you would like it.

 On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:
 Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)
  V1 V2                 V3    V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his i    s
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple

 Which is not the way I want it.

 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
    V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
    1L), .Label = c( This 23 This is a,  This is 12334 Thi,
     This is an 1232 T,  This is an exampl), class = factor),
    V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
    n exa, s is ), class = factor), V5 = structure(c(2L,
    4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))

 Any ideas?
 -L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Duncan Murdoch

On 9/8/2009 8:21 AM, Lauri Nikkinen wrote:

This data is from database and the maximum length of a field is
defined. I mean that every column has a maximum length and I want to
use this maximum length as a separator. So if one cell in that
column is shorter than the maximum, cell should be padded with white
spaces or something like that. This seems to be hard to explain.


Your problem is the intermediate file.  Why not get R to read directly 
from the database, using RODBC?


Duncan Murdoch



Regards,
L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:

On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:


Thanks, I tried it but I got


varlength - c(2, 2, 18, 5, 18)
read.fwf(c:temppi.txt, widths=varlength)


 V1 V2 V3V4   V5
1 DF 12  This is an exampl e 1 T  his
2 DF 12  This is an 1232 T his is
3 DF 14  This is 12334 Thi s is   an
4 DF 15  This 23 This is a n exa mple

Which is not the way I want it.


It looks as though that's because you don't have fixed width data.   This
is an example is 19 chars, including the leading space.  You told R it was
18.   This is an  is only 12 characters.

I would say you have two fixed width fields, and three varying fields, with
no delimiters.  If the middle one of the three always contains digits and
the others don't, you can probably extract them using sub(), but you can't
use any of the read.* functions to do this:  your format is too strange.

Duncan Murdoch



structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
= factor),
   V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
   1L), .Label = c( This 23 This is a,  This is 12334 Thi,
This is an 1232 T,  This is an exampl), class = factor),
   V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
   n exa, s is ), class = factor), V5 = structure(c(2L,
   4L, 1L, 3L), .Label = c(an , his, mple, s), class =
factor)), .Names = c(V1,
V2, V3, V4, V5), class = data.frame, row.names = c(NA,
-4L))

Any ideas?
-L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:


On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:


I have a text file similar to this (separated by spaces):

x - DF12 This is an example 1 This
DF12 This is an 1232 This is
DF14 This is 12334 This is an
DF15 This 23 This is an example


and I know the field lengths of each variable (there is 5 variables in
this data set), which are:

varlength - c(2, 2, 18, 5, 18)

How can I import this kind of data into R, using the varlength
variable as an field separator indicator?


See ?read.fwf.

Duncan Murdoch






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Philipp Pagel
On Tue, Sep 08, 2009 at 02:53:11PM +0300, Lauri Nikkinen wrote:
 I have a text file similar to this (separated by spaces):
 
 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 
 
 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:
 
 varlength - c(2, 2, 18, 5, 18)
 
 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

I am not totally sure what exaclty the expected result is. From your
description I got the impression that your data file uses a mixture of
separation characters and fixed-width formatting. Maybe I
misinterpreted your example. Have a look at read.fwf() an if that does
not solve your problem maybe explain the Structure and expected result
a little further.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
Thanks for the suggestion, but I don't have an access to this
database, I just got this messy file.

-L

2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 On 9/8/2009 8:21 AM, Lauri Nikkinen wrote:

 This data is from database and the maximum length of a field is
 defined. I mean that every column has a maximum length and I want to
 use this maximum length as a separator. So if one cell in that
 column is shorter than the maximum, cell should be padded with white
 spaces or something like that. This seems to be hard to explain.

 Your problem is the intermediate file.  Why not get R to read directly from
 the database, using RODBC?

 Duncan Murdoch


 Regards,
 L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:

 On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:

 Thanks, I tried it but I got

 varlength - c(2, 2, 18, 5, 18)
 read.fwf(c:temppi.txt, widths=varlength)

  V1 V2                 V3    V4   V5
 1 DF 12  This is an exampl e 1 T  his
 2 DF 12  This is an 1232 T his i    s
 3 DF 14  This is 12334 Thi s is   an
 4 DF 15  This 23 This is a n exa mple

 Which is not the way I want it.

 It looks as though that's because you don't have fixed width data.  
 This
 is an example is 19 chars, including the leading space.  You told R it
 was
 18.   This is an  is only 12 characters.

 I would say you have two fixed width fields, and three varying fields,
 with
 no delimiters.  If the middle one of the three always contains digits and
 the others don't, you can probably extract them using sub(), but you
 can't
 use any of the read.* functions to do this:  your format is too strange.

 Duncan Murdoch


 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
 = factor),
   V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
   1L), .Label = c( This 23 This is a,  This is 12334 Thi,
    This is an 1232 T,  This is an exampl), class = factor),
   V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
   n exa, s is ), class = factor), V5 = structure(c(2L,
   4L, 1L, 3L), .Label = c(an , his, mple, s), class =
 factor)), .Names = c(V1,
 V2, V3, V4, V5), class = data.frame, row.names = c(NA,
 -4L))

 Any ideas?
 -L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:

 On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:

 I have a text file similar to this (separated by spaces):

 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 

 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:

 varlength - c(2, 2, 18, 5, 18)

 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

 See ?read.fwf.

 Duncan Murdoch




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Petr PIKAL
Hi

what about reading each line by readLine and then split it to desired 
portions?

x-paste(letters, collapse=)
substring(x, c(1,3,5),c(2,4,15))

Regards
Petr


r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53:

 This data is from database and the maximum length of a field is
 defined. I mean that every column has a maximum length and I want to
 use this maximum length as a separator. So if one cell in that
 column is shorter than the maximum, cell should be padded with white
 spaces or something like that. This seems to be hard to explain.
 
 Regards,
 L
 
 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
  On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:
 
  Thanks, I tried it but I got
 
  varlength - c(2, 2, 18, 5, 18)
  read.fwf(c:temppi.txt, widths=varlength)
 
   V1 V2 V3V4   V5
  1 DF 12  This is an exampl e 1 T  his
  2 DF 12  This is an 1232 T his is
  3 DF 14  This is 12334 Thi s is   an
  4 DF 15  This 23 This is a n exa mple
 
  Which is not the way I want it.
 
  It looks as though that's because you don't have fixed width data.   
This
  is an example is 19 chars, including the leading space.  You told R 
it was
  18.   This is an  is only 12 characters.
 
  I would say you have two fixed width fields, and three varying fields, 
with
  no delimiters.  If the middle one of the three always contains digits 
and
  the others don't, you can probably extract them using sub(), but you 
can't
  use any of the read.* functions to do this:  your format is too 
strange.
 
  Duncan Murdoch
 
 
  structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
  = factor),
 V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
 1L), .Label = c( This 23 This is a,  This is 12334 Thi,
  This is an 1232 T,  This is an exampl), class = factor),
 V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
 n exa, s is ), class = factor), V5 = structure(c(2L,
 4L, 1L, 3L), .Label = c(an , his, mple, s), class =
  factor)), .Names = c(V1,
  V2, V3, V4, V5), class = data.frame, row.names = c(NA,
  -4L))
 
  Any ideas?
  -L
 
  2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 
  On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:
 
  I have a text file similar to this (separated by spaces):
 
  x - DF12 This is an example 1 This
  DF12 This is an 1232 This is
  DF14 This is 12334 This is an
  DF15 This 23 This is an example
  
 
  and I know the field lengths of each variable (there is 5 variables 
in
  this data set), which are:
 
  varlength - c(2, 2, 18, 5, 18)
 
  How can I import this kind of data into R, using the varlength
  variable as an field separator indicator?
 
  See ?read.fwf.
 
  Duncan Murdoch
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Philipp Pagel
On Tue, Sep 08, 2009 at 03:21:53PM +0300, Lauri Nikkinen wrote:
 This data is from database and the maximum length of a field is
 defined. I mean that every column has a maximum length and I want to
 use this maximum length as a separator. So if one cell in that
 column is shorter than the maximum, cell should be padded with white
 spaces or something like that. This seems to be hard to explain.

OK - now I got it. RODBC has already been sugested. If for some reason
that is impossible you could try to dump the data using a proper
delimiter (e.g. tab). Without a real delimiter it is certainly hard to
parse the data - and it may even be impossible depending on what
characters are allowed in your free-text fields.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
Thanks Petr, I tried something like this

 con - file(C:temppi.txt, r, blocking = FALSE)
 g - readLines(con)
 close(con)

 sta - c(1, 3, 5, 19)
 sto - c(2, 4, 18, 100)
 do.call(rbind, lapply(g, function(x) substring(x, sta, sto)))
 [,1] [,2] [,3] [,4]
[1,] DF 12  This is an ex ample 1 This
[2,] DF 12  This is an 12 32 This is
[3,] DF 14  This is 12334  This is an 
[4,] DF 15  This 23 This  is an example


But this is not the solution I was looking for. Thanks.

-L

2009/9/8 Petr PIKAL petr.pi...@precheza.cz:
 Hi

 what about reading each line by readLine and then split it to desired
 portions?

 x-paste(letters, collapse=)
 substring(x, c(1,3,5),c(2,4,15))

 Regards
 Petr


 r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53:

 This data is from database and the maximum length of a field is
 defined. I mean that every column has a maximum length and I want to
 use this maximum length as a separator. So if one cell in that
 column is shorter than the maximum, cell should be padded with white
 spaces or something like that. This seems to be hard to explain.

 Regards,
 L

 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
  On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:
 
  Thanks, I tried it but I got
 
  varlength - c(2, 2, 18, 5, 18)
  read.fwf(c:temppi.txt, widths=varlength)
 
   V1 V2                 V3    V4   V5
  1 DF 12  This is an exampl e 1 T  his
  2 DF 12  This is an 1232 T his i    s
  3 DF 14  This is 12334 Thi s is   an
  4 DF 15  This 23 This is a n exa mple
 
  Which is not the way I want it.
 
  It looks as though that's because you don't have fixed width data.  
 This
  is an example is 19 chars, including the leading space.  You told R
 it was
  18.   This is an  is only 12 characters.
 
  I would say you have two fixed width fields, and three varying fields,
 with
  no delimiters.  If the middle one of the three always contains digits
 and
  the others don't, you can probably extract them using sub(), but you
 can't
  use any of the read.* functions to do this:  your format is too
 strange.
 
  Duncan Murdoch
 
 
  structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
  = factor),
     V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
     1L), .Label = c( This 23 This is a,  This is 12334 Thi,
      This is an 1232 T,  This is an exampl), class = factor),
     V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i,
     n exa, s is ), class = factor), V5 = structure(c(2L,
     4L, 1L, 3L), .Label = c(an , his, mple, s), class =
  factor)), .Names = c(V1,
  V2, V3, V4, V5), class = data.frame, row.names = c(NA,
  -4L))
 
  Any ideas?
  -L
 
  2009/9/8 Duncan Murdoch murd...@stats.uwo.ca:
 
  On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:
 
  I have a text file similar to this (separated by spaces):
 
  x - DF12 This is an example 1 This
  DF12 This is an 1232 This is
  DF14 This is 12334 This is an
  DF15 This 23 This is an example
  
 
  and I know the field lengths of each variable (there is 5 variables
 in
  this data set), which are:
 
  varlength - c(2, 2, 18, 5, 18)
 
  How can I import this kind of data into R, using the varlength
  variable as an field separator indicator?
 
  See ?read.fwf.
 
  Duncan Murdoch
 
 
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Barry Rowlingson
On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:

 But this is not the solution I was looking for. Thanks.

 I think the only way you'll get the solution you are looking for is
if you can let us have a copy of the original input file, or at least
the first few lines - and not pasted into an email because special
characters like spaces and tabs get smushed up and confuse things.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
Ok, I think that I have to give up and try to get this data separated
by some char. It seem pretty much impossible to separate those fields.
Thanks for your help and efforts.

-L

2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi:
 This is the file (see the attachment) that represents the problem I'm
 facing with the original file. I'm looking for some generic way to
 solve this problem. Thank you for your time.

 -L

 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk:
 On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:

 But this is not the solution I was looking for. Thanks.

  I think the only way you'll get the solution you are looking for is
 if you can let us have a copy of the original input file, or at least
 the first few lines - and not pasted into an email because special
 characters like spaces and tabs get smushed up and confuse things.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread David Winsemius


On Sep 8, 2009, at 12:00 PM, Lauri Nikkinen wrote:


Ok, I think that I have to give up and try to get this data separated
by some char. It seem pretty much impossible to separate those fields.
Thanks for your help and efforts.


The solution that Henrique offered seems to be a complete one:

read.table(textConnection(gsub(([0-9]+), ;\\1;, DF12 This is an  
example 1 This

+ DF12 This is an 1232 This is
+ DF14 This is 12334 This is an
+ DF15 This 23 This is an example
+ )), sep = ;)
  V1 V2   V3V4  V5
1 DF 12  This is an example  1This
2 DF 12  This is an   1232 This is
3 DF 14 This is  12334 This is an
4 DF 15This 23  This is an example

Verus what you wanted...
 structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class
+ = factor),
+V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
+1L), .Label = c(This, This is, This is an, This is an  
example

+), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 =
+ structure(1:4, .Label = c(This,
+This is, This is an, This is an example), class =
+ factor)), .Names = c(V1,
+ V2, V3, V4, V5), class = data.frame, row.names = c(NA,
+ -4L))
  V1 V2 V3V4 V5
1 DF 12 This is an example 1   This
2 DF 12 This is an  1232This is
3 DF 14This is 12334 This is an
4 DF 15   This23 This is an example

Unless you can be any clearer ... than you have been to this hour.



-L

2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi:

This is the file (see the attachment) that represents the problem I'm
facing with the original file. I'm looking for some generic way to
solve this problem. Thank you for your time.

-L

2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk:
On Tue, Sep 8, 2009 at 1:52 PM, Lauri  
Nikkinenlauri.nikki...@iki.fi wrote:



But this is not the solution I was looking for. Thanks.


 I think the only way you'll get the solution you are looking for is
if you can let us have a copy of the original input file, or at  
least

the first few lines - and not pasted into an email because special
characters like spaces and tabs get smushed up and confuse things.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Lauri Nikkinen
This is the file (see the attachment) that represents the problem I'm
facing with the original file. I'm looking for some generic way to
solve this problem. Thank you for your time.

-L

2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk:
 On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote:

 But this is not the solution I was looking for. Thanks.

  I think the only way you'll get the solution you are looking for is
 if you can let us have a copy of the original input file, or at least
 the first few lines - and not pasted into an email because special
 characters like spaces and tabs get smushed up and confuse things.

DF12 This is an example 1 This
DF12 This is an 1232 This is
DF14 This is 12334 This is an 
DF15 This 23 This is an example
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.