Re: [R] regexpr

2007-07-03 Thread runner

using lapply is so great. That help me a lot.
thanks.


Stephen Tucker wrote:
 
 I think you are looking for paste().
 
 And you can replace your for loop with lapply(), which will apply regexpr
 to
 every element of 'mylist' (as the first argument, which is 'pattern').
 'text'
 can be a vector also:
 
 mylist - c(MN,NY,FL)
 lapply(paste(mylist,$,sep=),regexpr,text=Those from MN:)
 
 
 
 --- runner [EMAIL PROTECTED] wrote:
 
 
 Hi, 
 
 I 'd like to match each member of a list to a target string, e.g.
 --
 mylist=c(MN,NY,FL)
 g=regexpr(mylist[1], Those from MN:)
 if (g0)
 {
 On list
 }
 --
 My question is:
 
 How to add an end-of-string symbol '$' to the to-match string? so that
 'M'
 won't match.
 
 Of course, MN$ will work, but i want to use it in a loop; mylist[i]
 is
 what i need. I tried mylist[1]$, but didn't work. So why it doesn't
 extrapolate? How to do it?
 
 Thanks a lot!
 -- 
 View this message in context:
 http://www.nabble.com/regexpr-tf4000743.html#a11363041
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  
 
 Bored stiff? Loosen up...
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/regexpr-tf4000743.html#a11412603
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regexpr

2007-06-29 Thread runner

Hi, 

I 'd like to match each member of a list to a target string, e.g.
--
mylist=c(MN,NY,FL)
g=regexpr(mylist[1], Those from MN:)
if (g0)
{
On list
}
--
My question is:

How to add an end-of-string symbol '$' to the to-match string? so that 'M'
won't match.

Of course, MN$ will work, but i want to use it in a loop; mylist[i] is
what i need. I tried mylist[1]$, but didn't work. So why it doesn't
extrapolate? How to do it?

Thanks a lot!
-- 
View this message in context: 
http://www.nabble.com/regexpr-tf4000743.html#a11363041
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr

2007-06-29 Thread jim holtman
mylist=c(MN,NY,FL)
g=regexpr(paste(mylist[1], $, sep=), Those from MN:)
if (g0)
{
On list
}

or in a loop

for (i in mylist){
if (regexpr(paste(mylist[i], $, sep=))  0){
.code for those from
}
}



On 6/29/07, runner [EMAIL PROTECTED] wrote:


 Hi,

 I 'd like to match each member of a list to a target string, e.g.
 --
 mylist=c(MN,NY,FL)
 g=regexpr(mylist[1], Those from MN:)
 if (g0)
 {
 On list
 }
 --
 My question is:

 How to add an end-of-string symbol '$' to the to-match string? so that 'M'
 won't match.

 Of course, MN$ will work, but i want to use it in a loop; mylist[i] is
 what i need. I tried mylist[1]$, but didn't work. So why it doesn't
 extrapolate? How to do it?

 Thanks a lot!
 --
 View this message in context:
 http://www.nabble.com/regexpr-tf4000743.html#a11363041
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr

2007-06-29 Thread Stephen Tucker
I think you are looking for paste().

And you can replace your for loop with lapply(), which will apply regexpr to
every element of 'mylist' (as the first argument, which is 'pattern'). 'text'
can be a vector also:

mylist - c(MN,NY,FL)
lapply(paste(mylist,$,sep=),regexpr,text=Those from MN:)



--- runner [EMAIL PROTECTED] wrote:

 
 Hi, 
 
 I 'd like to match each member of a list to a target string, e.g.
 --
 mylist=c(MN,NY,FL)
 g=regexpr(mylist[1], Those from MN:)
 if (g0)
 {
 On list
 }
 --
 My question is:
 
 How to add an end-of-string symbol '$' to the to-match string? so that 'M'
 won't match.
 
 Of course, MN$ will work, but i want to use it in a loop; mylist[i] is
 what i need. I tried mylist[1]$, but didn't work. So why it doesn't
 extrapolate? How to do it?
 
 Thanks a lot!
 -- 
 View this message in context:
 http://www.nabble.com/regexpr-tf4000743.html#a11363041
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 



 

Bored stiff? Loosen up...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regexpr and parsing question

2007-01-30 Thread Kimpel, Mark William
The main problem I am trying to solve it this:

I am importing a tab delimited file whose first line contains only one
column, which is a descriptor of the form col_1 col_2 col_3, i.e. the
colnames are not tab delineated but are separated by whitespace. I would
like to parse this first line and make such that it becomes the colnames
of the rest of the file, which I am reading into R using read.delim().
The file is so huge that I must do this in R.

My first question is this: What is the best way to accomplish what I
want to do?

My other questions revolve around some failed attempts on my part to
solve the problem on my own using regular expressions. I thought that
perhaps I could change the first line to c(col_1, col_2, col_3)
using gsub. I was having trouble figuring out how R uses the backslash
character because I know that sometimes the backslash one would use in
Perl needs to be a double backslash in R.

Here is a sample of what I tried and what I got:

a-col_1 col_2 col_3

 gsub(\\s,   , a) 

[1] col_1 col_2 col_3

 gsub(\\s, \\s , a) 

[1] col_1scol_2scol_3

As you can see, it looks like R is taking a regular expression for
pattern, but not taking it for replacement. Why is this?

Assuming that I did want to solve my original problem with gsub and then
turn the string into an R object, how would I get gsub to return
c(col_1, col_2, col_3) using my original string?

Finally, is there a way to declare a string as a regular expression so
that R sees it the same way other languages, such as Perl do, i.e. make
the backslash be interpreted the same way? For someone who is just
learning regular expressions as I am, it is very frustrating to read
about them in references and then have to translate what I've learned
into R syntax. I was thinking that instead of enclosing the string in
, one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
use I() in formulae.

These are a bunch of questions, but obviously I have a lot to learn!

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr and parsing question

2007-01-30 Thread Gabor Grothendieck
Both spaces and tabs are whitespace so this
should be good enough (unless you can
have empty fields):

read.table(myfile.dat, header = TRUE)

See the sep= argument in ?read.table .

Although I don't think you really need this, here are
some regular expressions for processing a header
into the form you asked for.  The first line places
quotes around the names, the second one inserts
commas and the last one adds c( and ).

s - gsub('(\\S+)', '\\1', 'col1 col2 col3')
s - gsub((\\S+) , \\1, , s)
sub((.*), c(\\1), s)


On 1/30/07, Kimpel, Mark William [EMAIL PROTECTED] wrote:
 The main problem I am trying to solve it this:

 I am importing a tab delimited file whose first line contains only one
 column, which is a descriptor of the form col_1 col_2 col_3, i.e. the
 colnames are not tab delineated but are separated by whitespace. I would
 like to parse this first line and make such that it becomes the colnames
 of the rest of the file, which I am reading into R using read.delim().
 The file is so huge that I must do this in R.

 My first question is this: What is the best way to accomplish what I
 want to do?

 My other questions revolve around some failed attempts on my part to
 solve the problem on my own using regular expressions. I thought that
 perhaps I could change the first line to c(col_1, col_2, col_3)
 using gsub. I was having trouble figuring out how R uses the backslash
 character because I know that sometimes the backslash one would use in
 Perl needs to be a double backslash in R.

 Here is a sample of what I tried and what I got:

 a-col_1 col_2 col_3

  gsub(\\s,   , a)

 [1] col_1 col_2 col_3

  gsub(\\s, \\s , a)

 [1] col_1scol_2scol_3

 As you can see, it looks like R is taking a regular expression for
 pattern, but not taking it for replacement. Why is this?

 Assuming that I did want to solve my original problem with gsub and then
 turn the string into an R object, how would I get gsub to return
 c(col_1, col_2, col_3) using my original string?

 Finally, is there a way to declare a string as a regular expression so
 that R sees it the same way other languages, such as Perl do, i.e. make
 the backslash be interpreted the same way? For someone who is just
 learning regular expressions as I am, it is very frustrating to read
 about them in references and then have to translate what I've learned
 into R syntax. I was thinking that instead of enclosing the string in
 , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
 use I() in formulae.

 These are a bunch of questions, but obviously I have a lot to learn!

 Thanks,

 Mark

 Mark W. Kimpel MD



 (317) 490-5129 Work,  Mobile



 (317) 663-0513 Home (no voice mail please)

 1-(317)-536-2730 FAX

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr and parsing question

2007-01-30 Thread Marc Schwartz
On Tue, 2007-01-30 at 17:23 -0500, Kimpel, Mark William wrote:
 The main problem I am trying to solve it this:
 
 I am importing a tab delimited file whose first line contains only one
 column, which is a descriptor of the form col_1 col_2 col_3, i.e. the
 colnames are not tab delineated but are separated by whitespace. I would
 like to parse this first line and make such that it becomes the colnames
 of the rest of the file, which I am reading into R using read.delim().
 The file is so huge that I must do this in R.
 
 My first question is this: What is the best way to accomplish what I
 want to do?

Mark,

The first thing that comes to mind is a two pass approach on the file:

First pass: (using example file with your first line)

# Get the first line into a vector to set the colnames for the DF
# during the second pass
ColNames - unlist(read.table(test.txt, nrow = 1, as.is = TRUE))

 str(ColNames)
 Named chr [1:3] col_1 col_2 col_3
 - attr(*, names)= chr [1:3] V1 V2 V3


Second pass:

# Now read the rest of the file, skipping the first line
DF - read.delim(test.txt, skip = 1, col.names = ColNames)


I believe that should get you the full data set and set the colnames
based upon the first line. This should pretty much obviate the need for
everything below here.

 My other questions revolve around some failed attempts on my part to
 solve the problem on my own using regular expressions. I thought that
 perhaps I could change the first line to c(col_1, col_2, col_3)
 using gsub. I was having trouble figuring out how R uses the backslash
 character because I know that sometimes the backslash one would use in
 Perl needs to be a double backslash in R.

You would not want to change the first line as you have it above, as it
would not be parsed properly using read.table() family functions.

 Here is a sample of what I tried and what I got:
 
 a-col_1 col_2 col_3
 
  gsub(\\s,   , a) 
 
 [1] col_1 col_2 col_3
 
  gsub(\\s, \\s , a) 
 
 [1] col_1scol_2scol_3
 
 As you can see, it looks like R is taking a regular expression for
 pattern, but not taking it for replacement. Why is this?

There are various settings for how regex are interpreted by/within R.
See ?grep and note the various arguments to the functions there and how
they impact R's behavior here.

Also, note that there is a difference (to further complicate your
life...) between the characters that R displays by default using print()
and how they are displayed using cat(). See below.

 a
[1] col_1 col_2 col_3

 gsub( , ,  , a)
[1] col_1, col_2, col_3

or to get you to your vector statement above:

Note the result here:

 paste(c(\, gsub( , \, \ , a), \), sep = )
[1] c(\col_1\, \col_2\, \col_3\)


Now see how it displays when the escaped double quote chars are
interpreted properly using cat():

 cat(paste(c(\, gsub( , \, \ , a), \), sep = ), \n)
c(col_1, col_2, col_3) 


 Assuming that I did want to solve my original problem with gsub and then
 turn the string into an R object, how would I get gsub to return
 c(col_1, col_2, col_3) using my original string?

Again, note the two pass solution above.  It's easier, unless you would
want to consider using awk/sed from a CLI, which I generally avoid at
all costs...

 Finally, is there a way to declare a string as a regular expression so
 that R sees it the same way other languages, such as Perl do, i.e. make
 the backslash be interpreted the same way? For someone who is just
 learning regular expressions as I am, it is very frustrating to read
 about them in references and then have to translate what I've learned
 into R syntax. I was thinking that instead of enclosing the string in
 , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
 use I() in formulae.

Part of the challenge is noting the different behaviors of regex within
R and how that behavior is affected by the aforementioned arguments.
Also, noting how the output is displayed within R relative to the
interpretation of escaped characters as is seen above.

 These are a bunch of questions, but obviously I have a lot to learn!
 
 Thanks,
 
 Mark


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr and parsing question

2007-01-30 Thread Gabor Grothendieck
And here is an alternative to the regular expressions (although again
I don't think you really need any of this):

 capture.output(dput(strsplit(col1 col2 col3,  )[[1]]))
[1] c(\col1\, \col2\, \col3\)

On 1/30/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Both spaces and tabs are whitespace so this
 should be good enough (unless you can
 have empty fields):

 read.table(myfile.dat, header = TRUE)

 See the sep= argument in ?read.table .

 Although I don't think you really need this, here are
 some regular expressions for processing a header
 into the form you asked for.  The first line places
 quotes around the names, the second one inserts
 commas and the last one adds c( and ).

 s - gsub('(\\S+)', '\\1', 'col1 col2 col3')
 s - gsub((\\S+) , \\1, , s)
 sub((.*), c(\\1), s)


 On 1/30/07, Kimpel, Mark William [EMAIL PROTECTED] wrote:
  The main problem I am trying to solve it this:
 
  I am importing a tab delimited file whose first line contains only one
  column, which is a descriptor of the form col_1 col_2 col_3, i.e. the
  colnames are not tab delineated but are separated by whitespace. I would
  like to parse this first line and make such that it becomes the colnames
  of the rest of the file, which I am reading into R using read.delim().
  The file is so huge that I must do this in R.
 
  My first question is this: What is the best way to accomplish what I
  want to do?
 
  My other questions revolve around some failed attempts on my part to
  solve the problem on my own using regular expressions. I thought that
  perhaps I could change the first line to c(col_1, col_2, col_3)
  using gsub. I was having trouble figuring out how R uses the backslash
  character because I know that sometimes the backslash one would use in
  Perl needs to be a double backslash in R.
 
  Here is a sample of what I tried and what I got:
 
  a-col_1 col_2 col_3
 
   gsub(\\s,   , a)
 
  [1] col_1 col_2 col_3
 
   gsub(\\s, \\s , a)
 
  [1] col_1scol_2scol_3
 
  As you can see, it looks like R is taking a regular expression for
  pattern, but not taking it for replacement. Why is this?
 
  Assuming that I did want to solve my original problem with gsub and then
  turn the string into an R object, how would I get gsub to return
  c(col_1, col_2, col_3) using my original string?
 
  Finally, is there a way to declare a string as a regular expression so
  that R sees it the same way other languages, such as Perl do, i.e. make
  the backslash be interpreted the same way? For someone who is just
  learning regular expressions as I am, it is very frustrating to read
  about them in references and then have to translate what I've learned
  into R syntax. I was thinking that instead of enclosing the string in
  , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
  use I() in formulae.
 
  These are a bunch of questions, but obviously I have a lot to learn!
 
  Thanks,
 
  Mark
 
  Mark W. Kimpel MD
 
 
 
  (317) 490-5129 Work,  Mobile
 
 
 
  (317) 663-0513 Home (no voice mail please)
 
  1-(317)-536-2730 FAX
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr and portability issue

2005-08-03 Thread Prof Brian Ripley
On Tue, 2 Aug 2005, Marco Blanchette wrote:

 I am still forging my first arms with R and I am fighting with regexpr() as
 well as portability between unix and windoz. I need to extract barcodes from
 filenames (which are located between a double and single underscore) as well
 as the directory where the filename is residing. Here is the solution I came
 to:

 aFileName -
 /Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt
 t - regexpr(__\\d*_,aFileName, perl=T)
 t.dir - regexpr(^.*/, aFileName, perl=T)
 base.name - substr(aFileName, t+2, t-2 + attr(t,match.length))
 base.dir - substr(aFileName, t.dir, attr(t.dir,match.length))

 My questions are:
 1) Is there a more elegant way to deal with regular expressions (read here:
 more easier, more like perl style).

Yes, use sub and backreferences.  An example from the R sources doing 
something similar:

 wfile - sub(/chm/([^/]*)$, , file)
 thispkg - sub(.*/([^/]*)/chm/([^/]*)$, \\1, file)

However, R does have functions basename() and dirname() to do this!

 2) I have a portability problem when I extract the base.dir Windoz is using
 '\' instead of '/' to separate directories.

That is misinformation: Windows (sic) accepts either / or \ (see the 
rw-FAQ and the R FAQ).  Use chartr(\\, /, path) to map \ to /.

The `portability problem' appears to be of your own making -- take heart 
that R itself manages to manipulate filepaths portably.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] regexpr and portability issue

2005-08-02 Thread Marco Blanchette
Dear all--

I am still forging my first arms with R and I am fighting with regexpr() as
well as portability between unix and windoz. I need to extract barcodes from
filenames (which are located between a double and single underscore) as well
as the directory where the filename is residing. Here is the solution I came
to:

aFileName - 
/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt
t - regexpr(__\\d*_,aFileName, perl=T)
t.dir - regexpr(^.*/, aFileName, perl=T)
base.name - substr(aFileName, t+2, t-2 + attr(t,match.length))
base.dir - substr(aFileName, t.dir, attr(t.dir,match.length))

My questions are:
1) Is there a more elegant way to deal with regular expressions (read here:
more easier, more like perl style).
2) I have a portability problem when I extract the base.dir Windoz is using
'\' instead of '/' to separate directories.

Any suggestions/comments

Many Tx

Marco Blanchette, Ph.D.

[EMAIL PROTECTED]

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] regexpr and portability issue

2005-08-02 Thread Gabor Grothendieck
Try this.  The regular expression says to match 
- anything 
- followed by a double underscore 
- followed by one or more digits
- followed by an underscore 
- followed by anything.  
The digits have been parenthesized so that they can be referred to in
the backreference \\1.Also use the R function dirname
rather than regular expressions.

base.name - sub(.*__([[:digit:]]+)_.*, \\1, aFileName, ext = TRUE)
base.dir - dirname(aFileName)


On 8/3/05, Marco Blanchette [EMAIL PROTECTED] wrote:
 Dear all--
 
 I am still forging my first arms with R and I am fighting with regexpr() as
 well as portability between unix and windoz. I need to extract barcodes from
 filenames (which are located between a double and single underscore) as well
 as the directory where the filename is residing. Here is the solution I came
 to:
 
 aFileName -
 /Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt
 t - regexpr(__\\d*_,aFileName, perl=T)
 t.dir - regexpr(^.*/, aFileName, perl=T)
 base.name - substr(aFileName, t+2, t-2 + attr(t,match.length))
 base.dir - substr(aFileName, t.dir, attr(t.dir,match.length))
 
 My questions are:
 1) Is there a more elegant way to deal with regular expressions (read here:
 more easier, more like perl style).
 2) I have a portability problem when I extract the base.dir Windoz is using
 '\' instead of '/' to separate directories.
 
 Any suggestions/comments
 
 Many Tx
 
 Marco Blanchette, Ph.D.
 
 [EMAIL PROTECTED]
 
 Donald C. Rio's lab
 Department of Molecular and Cell Biology
 16 Barker Hall
 University of California
 Berkeley, CA 94720-3204
 
 Tel: (510) 642-1084
 Cell: (510) 847-0996
 Fax: (510) 642-6062
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Regexpr with .

2003-08-14 Thread Stephen C. Upton
Trevor,

The . is a regex meta-character that matches any character. In order to look
specifically for a ., the you must escape it with a \, and that \ must
also be escaped, thus,

 regexpr(\\., Female.Alabama)
[1] 7
attr(,match.length)
[1] 1


HTH
steve

Thompson, Trevor wrote:

 I'm trying to use the regexpr function to locate the decimal in a character
 string.  Regardless of the position of the decimal, the function returns 1.
 For example,

  regexpr(., Female.Alabama)
 [1] 1
 attr(,match.length)
 [1] 1

 In trying to figure out what was going on here, I tried the below command:

  gsub(., ,, Female.Alabama)
 [1] ,,

 It looks like R is treating every character in the string as if it were
 decimal.  I didn't see anything in the help file about . being some kind
 of special character.  Any idea why R is treating a decimal this way in
 these functions?   Any suggestions how to get around this?

 Thanks for any suggestions.

 -Trevor


 [[alternative HTML version deleted]]

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Regexpr with .

2003-08-14 Thread Jeff Gentry
 I'm trying to use the regexpr function to locate the decimal in a character
 string.  Regardless of the position of the decimal, the function returns 1.

You need to escape it.

 gsub(\\.,,,Female.Alabama)
[1] Female,Alabama

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Regexpr with .

2003-08-14 Thread Chuck Cleland
Thompson, Trevor wrote:
I'm trying to use the regexpr function to locate the decimal in a character
string.  Regardless of the position of the decimal, the function returns 1.
For example,
regexpr(., Female.Alabama)
  You probably want backslashes to indicate that . should not 
be treated as a metacharacter; it should be taken literally.

 regexpr(\\., Female.Alabama)
[1] 7
attr(,match.length)
[1] 1
hope this helps,

Chuck Cleland

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Regexpr with .

2003-08-14 Thread Thompson, Trevor
I'm trying to use the regexpr function to locate the decimal in a character
string.  Regardless of the position of the decimal, the function returns 1.
For example,

 regexpr(., Female.Alabama)
[1] 1
attr(,match.length)
[1] 1

In trying to figure out what was going on here, I tried the below command:

 gsub(., ,, Female.Alabama)
[1] ,,

It looks like R is treating every character in the string as if it were
decimal.  I didn't see anything in the help file about . being some kind
of special character.  Any idea why R is treating a decimal this way in
these functions?   Any suggestions how to get around this?

Thanks for any suggestions.

-Trevor
 


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Regexpr with .

2003-08-14 Thread Ted Harding
On 13-Aug-03 Barry Rowlingson wrote:
 Thompson, Trevor wrote:
 I didn't see anything in the help file about . being some kind
 of special character.  Any idea why R is treating a decimal this
 way in these functions?   Any suggestions how to get around this?
 
 '.' is the regexpr character for matching any single character!
   regexpr(a.e, Female.Alabama)
 [1] 4
   To actually search for a dot, you need to 'escape' it with a 
 backslash, but of course the backslash needs escaping itself, with 
 another backslash. Luckily that backslash doesn't need escaping, 
 otherwise we would quickly run out of patience.
   regexpr(\\., Female.Alabama)
 [1] 7

It's also worth remembering the use of [], normally used to enclose
a disjunctive list of characters to match (e.g. [Aa] matches either
A or a) or a range (e.g. [0-9] matches any digit). Any metacharacter
occurring within will be interpreted literally with exceptions \
and (for obvious reasons) ] which must be escaped (in which case
the use of [] is redundant); -- however, [ works!

   regexpr(a.e, Female.Alabama)
  [1] 4
  attr(,match.length)
  [1] 3
   regexpr([.], Female.Alabama)
  [1] 7
  attr(,match.length)
  [1] 1
   regexpr([[], Female[Alabama)
  [1] 7
  attr(,match.length)
  [1] 1
   regexpr([\\], Female\\Alabama)
  [1] 7
  attr(,match.length)
  [1] 1
   regexpr([\]], Female]Alabama)
  [1] 7
  attr(,match.length)
  [1] 1

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 13-Aug-03   Time: 22:14:06
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Regexpr with .

2003-08-14 Thread David Khabie-Zeitoune
Try regexpr(\\., Female.Alabama)

-Original Message-
From: Thompson, Trevor [mailto:[EMAIL PROTECTED] 
Sent: 13 August 2003 15:47
To: [EMAIL PROTECTED]
Subject: [R] Regexpr with .


I'm trying to use the regexpr function to locate the decimal in a
character string.  Regardless of the position of the decimal, the
function returns 1. For example,

 regexpr(., Female.Alabama)
[1] 1
attr(,match.length)
[1] 1

In trying to figure out what was going on here, I tried the below
command:

 gsub(., ,, Female.Alabama)
[1] ,,

It looks like R is treating every character in the string as if it were
decimal.  I didn't see anything in the help file about . being some
kind of special character.  Any idea why R is treating a decimal this
way in
these functions?   Any suggestions how to get around this?

Thanks for any suggestions.

-Trevor
 


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Regexpr with .

2003-08-14 Thread Barry Rowlingson
Thompson, Trevor wrote:

It looks like R is treating every character in the string as if it were
decimal.  I didn't see anything in the help file about . being some kind
of special character.  Any idea why R is treating a decimal this way in
these functions?   Any suggestions how to get around this?
'.' is the regexpr character for matching any single character!

 regexpr(a.e, Female.Alabama)
[1] 4
 To actually search for a dot, you need to 'escape' it with a 
backslash, but of course the backslash needs escaping itself, with 
another backslash. Luckily that backslash doesn't need escaping, 
otherwise we would quickly run out of patience.

 regexpr(\\., Female.Alabama)
[1] 7
Baz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Regexpr with .

2003-08-14 Thread John Zhang
Try

regexpr(\\., Female.Alabama) and gsub(\\., ,, Female.Alabama)


X-Sybari-Trust: 9293cd92 d90ef28b 235e1558 093d
From: Thompson, Trevor [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Wed, 13 Aug 2003 10:46:45 -0400
MIME-Version: 1.0
X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
X-Spam-Status: No, hits=0.6 required=5.0 tests=HTML_30_40 version=2.54
X-Spam-Level: 
X-Spam-Checker-Version: SpamAssassin 2.54 (1.174.2.17-2003-05-11-exp)
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Subject: [R] Regexpr with .
X-BeenThere: [EMAIL PROTECTED]
X-Mailman-Version: 2.1.2
List-Id: Main R Mailing List: Primary help  r-help.stat.math.ethz.ch
List-Help: mailto:[EMAIL PROTECTED]
List-Post: mailto:[EMAIL PROTECTED]
List-Subscribe: https://www.stat.math.ethz.ch/mailman/listinfo/r-help, 
mailto:[EMAIL PROTECTED]
List-Archive: https://www.stat.math.ethz.ch/pipermail/r-help
List-Unsubscribe: https://www.stat.math.ethz.ch/mailman/listinfo/r-help, 
mailto:[EMAIL PROTECTED]

I'm trying to use the regexpr function to locate the decimal in a character
string.  Regardless of the position of the decimal, the function returns 1.
For example,

 regexpr(., Female.Alabama)
[1] 1
attr(,match.length)
[1] 1

In trying to figure out what was going on here, I tried the below command:

 gsub(., ,, Female.Alabama)
[1] ,,

It looks like R is treating every character in the string as if it were
decimal.  I didn't see anything in the help file about . being some kind
of special character.  Any idea why R is treating a decimal this way in
these functions?   Any suggestions how to get around this?

Thanks for any suggestions.

-Trevor
 


   [[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Jianhua Zhang
Department of Biostatistics
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Regexpr capturing in R?

2002-12-21 Thread Fredrik Karlsson


msg.pgp
Description: PGP message