Re: [R] regex - negate a word

2009-01-20 Thread Wacek Kusnierczyk
Prof Brian Ripley wrote:
 On Mon, 19 Jan 2009, Rolf Turner wrote:


 On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:

 Well, that's why it was only provided when you insisted.  This is
 not what regexp's are good at.

 On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thanks! (I have to admit, though, that I expected something simple)

 It may not be what regexp's are good at, but the grep command in
 unix/linux
 does what is required *very* simply via the ``-v'' flag.  I
 conjecture that
 it would not be difficult to add an argument with similar impact to the
 grep() function in R.

 Indeed.  I have often wondered why grep() returned indices, when a
 logical vector would seem more natural in R (and !grep(...) would have
 been all that was needed).

 Looking at the code I see it does in fact compute a logical vector,
 just not return it.  So adding 'invert' (the long-form of -v is
 --invert) is a job of a very few lines and I have done so for 2.9.0.


in fact, it's simpler than that.  instead of redundantly distributing
the fix over four different lines in character.c, it's enough to ^= the
logical vector of matched/unmatched flags in just one place, on-the-fly,
close to the end of the loop over the vector of input strings.  see
attached patch.

for consistency, you might want to
- name the internal invert flag 'invert_opt' instead of 'invert';
- apply the same fix to agrep.

it's also trivial to add another argument to grep, say 'logical', which
will cause grep to return a logical vector of the same length as the
input strings vector.  see the attached patch.  note: i am novice to r
internals, and i get some mystical warnings i haven't decoded yet while
using the extended grep, but otherwise the code compiles well and grep
works as intended; you'd need to fix the cause of the warnings.

if you want the 'logical' argument, you need to decide how it interacts
with 'values'.  in the patch, 'values' set to TRUE resets 'logical' to
FALSE, with a warning.

further suggestions:  the arguments 'values' and 'logical' could be
replaced with one argument, say 'output', which would take a value from
{'indices', 'values', 'logical'}.  it might make further extensions
easier to implement and maintain.

attached are patches to character.c, names.c, and grep.R; if you tell me
which other files need a patch to get rid of the warnigns (see below),
i'll make one. 

s = c(abc, bcd, cde)

grep(b, s)
# 1 2

grep(b, s, value=TRUE)
# abc bcd

grep(b, s, logical=TRUE)
# TRUE TRUE FALSE

s[grep(b, s, logical=TRUE)]
# abc bcd
# Warning: stack imbalance in 'grep', 9 then 10
# Warning: stack imbalance in '.Internal', 8 then 9
# Warning: stack imbalance in '{', 6 then 7

grep(b, s, invert=TRUE)
# 3

grep(b, s, invert=TRUE, value=TRUE)
# cde

s[!grep(b, s, logical)]
# cde
# Warning: stack imbalance in 'grep', 15 then 16
# Warning: stack imbalance in '.Internal', 14 then 15
# Warning: stack imbalance in '{', 12 then 13
# Warning: stack imbalance in '!', 6 then 7
# Warning: stack imbalance in '[', 2 then 3



vQ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-20 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 attached are patches to character.c, names.c, and grep.R; if you tell me
   

forgot to add:  the patches are against the latest r-devel
(19.01.2009).  compiled and tested on 32b Ubuntu 8.04.


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-19 Thread Wacek Kusnierczyk
Rolf Turner wrote:

 On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:

 Well, that's why it was only provided when you insisted.  This is
 not what regexp's are good at.

 On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thanks! (I have to admit, though, that I expected something simple)

 It may not be what regexp's are good at, but the grep command in
 unix/linux
 does what is required *very* simply via the ``-v'' flag.  I conjecture
 that
 it would not be difficult to add an argument with similar impact to the
 grep() function in R.

something like grep(..., inverse=TRUE), perhaps.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-19 Thread Wacek Kusnierczyk
Stavros Macrakis wrote:
 On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
   

 x[-grep(abc, x)]
 which unfortunately fails if none of the strings in x matches the pattern, 
 i.e., grep returns integer(0);
 

 Yes.

   
 arguably, x[integer(0)] should rather return all elements of x
 

 The meaning of x[V] (for an integer subscript vector V) is: 

what about numeric vectors?  r performs smart downcasting here:

x[1.1]
# same as x[1]

x[0.3]
# character(0)

 ignore 0
 entries, and then:
   

what if V=NULL? 

 a) if !(all(V0) | all(V0) ) = ERROR
   

there is no error for x[v] with V=0, V=as.numeric(NA), or V=NaN.

 b) if all (V0): length(x[V]) == length(V)
   


unfortunately, false if v contains a non-integer (so it goes beyond your
discussion, but may cause problems in practice):

x[c(1, 0.5)]
# one item (if x is non-empty)

 c) if all (V0): length(x[V]) == length(x)-length(unique(V))
   

not true for cases like V=c(-1, -1.5), which again go beyond your
discussion, but may happen in practice.

interestingly, unique(c(NA, NA)) is just NA, rather than c(NA,NA).  i'd
think that if we have two non-available values, we can't be sure they're
in fact equal, but unique apparently is.  (you'd have to tell it not to
be with incomparables=NA.)

 When length(V)==0, the preconditions are true for both (b) and (c), so
   

interestingly, all(V0)  all(V0) is TRUE for V=c().

 the R design has made the decision that length(x[V]) == 0 in this
 case.  If you're going to have the negative indices means exclusion
 trick, this seems like a reasonable convention.
   

i didn't say this was unreasonable, just that x[integer(0)] should,
arguably, return x.  'empty index' is not as precise an expression to be
sure that it will be obvious to everyone that integer(0) is *not* an
empty index, and less so with NULL.  what is meant, i guess, is 'empty
index expression', i.e., no index rather than empty index, and i'd
humbly suggest (risking being charged with boring pedantry) to improve tfm.


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-19 Thread Prof Brian Ripley

On Mon, 19 Jan 2009, Rolf Turner wrote:



On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:


Well, that's why it was only provided when you insisted.  This is
not what regexp's are good at.

On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de wrote:

Thanks! (I have to admit, though, that I expected something simple)


It may not be what regexp's are good at, but the grep command in unix/linux
does what is required *very* simply via the ``-v'' flag.  I conjecture that
it would not be difficult to add an argument with similar impact to the
grep() function in R.


Indeed.  I have often wondered why grep() returned indices, when a 
logical vector would seem more natural in R (and !grep(...) would have 
been all that was needed).


Looking at the code I see it does in fact compute a logical vector, 
just not return it.  So adding 'invert' (the long-form of -v is 
--invert) is a job of a very few lines and I have done so for 2.9.0.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regex - negate a word

2009-01-18 Thread Rau, Roland
Dear all,

let's assume I have a vector of character strings:

x - c(abcdef, defabc, qwerty)

What I would like to find is the following: all elements where the word
'abc' does not appear (i.e. 3 in this case of 'x').

Since I am not really experienced with regular expressions, I started
slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
[1] 1 2

So far, so good. Now I read that ^ is the negation operator. But it can
also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
[1] 1

Of course, we need to put it inside square brackets to negate the
expression [1]
 grep(pattern=[^abc], x=x)
[1] 1 2 3

But this is not what I want either.

I'd appreciate any help. I assume this is rather easy and
straightforward.

Thanks,
Roland


[1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
caret) inside square brackets negates the expression

--
This mail has been sent through the MPI for Demographic Research.  Should you 
receive a mail that is apparently from a MPI user without this text displayed, 
then the address has most likely been faked. If you are uncertain about the 
validity of this message, please check the mail header or ask your system 
administrator for assistance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread jim holtman
Just remove those elements that match:

 x - c(abcdef, defabc, qwerty)
 x[-grep('abc',x)]
[1] qwerty



On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should you 
 receive a mail that is apparently from a MPI user without this text 
 displayed, then the address has most likely been faked. If you are uncertain 
 about the validity of this message, please check the mail header or ask your 
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Rau, Roland wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').
   

a quick shot is:

x[-grep(abc, x)]

which unfortunately fails if none of the strings in x matches the
pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should
rather return all elements of x:

An empty index selects all values (from ?'[')

but apparently integer(0) does not count as an empty index (and neither
does NULL).  so you may want something like:

strings = c(abcdef, defabc, qwerty)
pattern = abc
if (length(matching - grep(pattern, strings))) x[-matching] else x

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
Try this:

# indexes
setdiff(seq_along(x), grep(abc, x))

# values
setdiff(x, grep(abc, x, value = TRUE))

Another possibility is:

z - abc
x0 - c(x, z) # to handle no match case
x0[- grep(z, x0)] # values




On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should you 
 receive a mail that is apparently from a MPI user without this text 
 displayed, then the address has most likely been faked. If you are uncertain 
 about the validity of this message, please check the mail header or ask your 
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Eric Archer

Roland,

I think you were almost there with your first example.  Howabout using:

 x - c(abcdef, defabc, qwerty)
 y - grep(pattern=abc, x=x)
 z.char - x[-y]
 z.index - (1:length(x))[-y]

 z.char
[1] qwerty
 z.index
[1] 3

Cheers,
eric

Rau, Roland wrote:

Dear all,

let's assume I have a vector of character strings:

x - c(abcdef, defabc, qwerty)

What I would like to find is the following: all elements where the word
'abc' does not appear (i.e. 3 in this case of 'x').

Since I am not really experienced with regular expressions, I started
slowly and thought I find all word were 'abc' actually does appear:


grep(pattern=abc, x=x)

[1] 1 2

So far, so good. Now I read that ^ is the negation operator. But it can
also denote the beginning of a string as in:


grep(pattern=^abc, x=x)

[1] 1

Of course, we need to put it inside square brackets to negate the
expression [1]

grep(pattern=[^abc], x=x)

[1] 1 2 3

But this is not what I want either.

I'd appreciate any help. I assume this is rather easy and
straightforward.

Thanks,
Roland


[1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
caret) inside square brackets negates the expression

--
This mail has been sent through the MPI for Demographic Research.  Should you 
receive a mail that is apparently from a MPI user without this text displayed, 
then the address has most likely been faked. If you are uncertain about the 
validity of this message, please check the mail header or ask your system 
administrator for assistance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--

Eric Archer, Ph.D.
Southwest Fisheries Science Center
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121 (work)
858-546-7003 (FAX)

ETP Cetacean Assessment Program: http://swfsc.noaa.gov/prd-etp.aspx
Population ID Program: http://swfsc.noaa.gov/prd-popid.aspx



Innocence about Science is the worst crime today.
   - Sir Charles Percy Snow

Lighthouses are more helpful than churches.
   - Benjamin Franklin

   ...but I'll take a GPS over either one.
   - John C. Craig George

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Jorge Ivan Velez wrote:
 Hi Wacek,
 I think you wanted to say strings instead x in your last line  : )

   

of course, thanks.  the correct version is:

if(length(matching - grep(pattern, strings)))
   strings[-matching]
   else strings

btw., and in relation to a recent post complaining about how the mailing
list is maintained, i must say that although the idea that posts could
be edited after they've been sent does may not sound good in general, i
think it would be useful to be able to just fix such minor typos in
place instead of posting a correction.  after all, the list is intended
to serve as help to those who care not only to ask, but also to browse
the archives.  but this is a side comment, i take no sides and make no
recommendations.


vQ




 Best,

 Jorge


 On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk 
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Rau, Roland wrote:
 
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

   
 a quick shot is:

 x[-grep(abc, x)]

 which unfortunately fails if none of the strings in x matches the
 pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should
 rather return all elements of x:

 An empty index selects all values (from ?'[')

 but apparently integer(0) does not count as an empty index (and neither
 does NULL).  so you may want something like:

 strings = c(abcdef, defabc, qwerty)
 pattern = abc
 if (length(matching - grep(pattern, strings))) x[-matching] else x

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Rau, Roland
Thank you very much to all of you for your fast and excellent help.
Since the -grep(...) solution seems to be favored by most of the answers, I 
just wonder if there is really no regular expression which does the job?!?

Thanks again,
Roland



-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
Sent: Sun 1/18/2009 8:28 PM
To: Rau, Roland
Cc: r-help@r-project.org
Subject: Re: [R] regex - negate a word
 
Try this:

# indexes
setdiff(seq_along(x), grep(abc, x))

# values
setdiff(x, grep(abc, x, value = TRUE))

Another possibility is:

z - abc
x0 - c(x, z) # to handle no match case
x0[- grep(z, x0)] # values




On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should you 
 receive a mail that is apparently from a MPI user without this text 
 displayed, then the address has most likely been faked. If you are uncertain 
 about the validity of this message, please check the mail header or ask your 
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
Try this:

grep(^([^a]|a[^b]|ab[^c])*.{0,2}$, x, perl = TRUE)


On Sun, Jan 18, 2009 at 2:37 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thank you very much to all of you for your fast and excellent help.
 Since the -grep(...) solution seems to be favored by most of the answers,
 I just wonder if there is really no regular expression which does the job?!?

 Thanks again,
 Roland



 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Sun 1/18/2009 8:28 PM
 To: Rau, Roland
 Cc: r-help@r-project.org
 Subject: Re: [R] regex - negate a word

 Try this:

 # indexes
 setdiff(seq_along(x), grep(abc, x))

 # values
 setdiff(x, grep(abc, x, value = TRUE))

 Another possibility is:

 z - abc
 x0 - c(x, z) # to handle no match case
 x0[- grep(z, x0)] # values




 On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should
 you receive a mail that is apparently from a MPI user without this text
 displayed, then the address has most likely been faked. If you are uncertain
 about the validity of this message, please check the mail header or ask your
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 Try this:


 # values
 setdiff(x, grep(abc, x, value = TRUE))

 Another possibility is:

 z - abc
 x0 - c(x, z) # to handle no match case
 x0[- grep(z, x0)] # values
   

on quick testing, these two and the if-based version have comparable
runtime, with a minor win for the last one, and if the input is moderate
this makes no real difference.

however, the second solution above is likely to fail if the pattern is
more complex, e.g., contains a character class or a wildcard:

strings = c(xyz)
pattern = a[a-z]
strings[-grep(pattern, c(strings, pattern))]
# character(0)


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
In that case just add fixed = TRUE

On Sun, Jan 18, 2009 at 2:58 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 Gabor Grothendieck wrote:
 Try this:


 # values
 setdiff(x, grep(abc, x, value = TRUE))

 Another possibility is:

 z - abc
 x0 - c(x, z) # to handle no match case
 x0[- grep(z, x0)] # values


 on quick testing, these two and the if-based version have comparable
 runtime, with a minor win for the last one, and if the input is moderate
 this makes no real difference.

 however, the second solution above is likely to fail if the pattern is
 more complex, e.g., contains a character class or a wildcard:

 strings = c(xyz)
 pattern = a[a-z]
 strings[-grep(pattern, c(strings, pattern))]
 # character(0)


 vQ


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 In that case just add fixed = TRUE
   

in general, if you want a complex pattern, you don't use 'fixed', and
then again you risk incorrect (well, correct for r, but not for the
problem) result in case no input string matches the pattern.


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 Try this:

 grep(^([^a]|a[^b]|ab[^c])*.{0,2}$, x, perl = TRUE)

   
... and see how cumbersome it becomes for a pattern as trivial as 'abc'. 

in perl, you typically don't invent such negative patterns, but rather
don't match positive patterns: instead of the match operator =~ and a
negative pattern, you use the no-match operator !~ and a positive pattern:

@strings = (abc, xyz);
@filtered = grep $_ !~ /abc/, @strings;

in r, one way to do the no-match is using -grep, but taking care of the
special case of no matches at all in the input vector.


 On Sun, Jan 18, 2009 at 2:37 PM, Rau, Roland r...@demogr.mpg.de wrote:
   
 Thank you very much to all of you for your fast and excellent help.
 Since the -grep(...) solution seems to be favored by most of the answers,
 I just wonder if there is really no regular expression which does the job?!?
 

in perl 5.10, you can try this:

@strings = (abc, xyz);
@filtered = grep $_ =~ /(abc)(*COMMIT)(*FAIL)|(*ACCEPT)/, @strings;

which works by making a string that matches the pattern fail, and any
other string succeed despite no match.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 On Sun, Jan 18, 2009 at 2:37 PM, Rau, Roland r...@demogr.mpg.de wrote:
   
 
 Thank you very much to all of you for your fast and excellent help.
 Since the -grep(...) solution seems to be favored by most of the answers,
 I just wonder if there is really no regular expression which does the job?!?
 
   

 in perl 5.10, you can try this:

 @strings = (abc, xyz);
 @filtered = grep $_ =~ /(abc)(*COMMIT)(*FAIL)|(*ACCEPT)/, @strings;

 which works by making a string that matches the pattern fail, and any
 other string succeed despite no match.
   

incidentally, recent pcre accepts such regexes:

# r code
ungrep = function(pattern, x, ...)
grep(paste(pattern, (*COMMIT)(*FAIL)|(*ACCEPT), sep=), x,
perl=TRUE, ...)

strings = c(abc, xyz)
pattern = a[a-z]
(filtered = strings[ungrep(pattern, strings)])
# xyz

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Wacek Kusnierczyk
Wacek Kusnierczyk wrote:

 # r code
 ungrep = function(pattern, x, ...)
 grep(paste(pattern, (*COMMIT)(*FAIL)|(*ACCEPT), sep=), x,
 perl=TRUE, ...)

 strings = c(abc, xyz)
 pattern = a[a-z]
 (filtered = strings[ungrep(pattern, strings)])
 # xyz
   

this was a toy example, but if you need this sort of ungrep with
patterns involving alterations, you need a fix:

ungrep(a|x, strings, value=TRUE)
# abc
# NOT character(0)

# fix
ungrep = function(pattern, x, ...)
grep(paste((?:, pattern, )(*COMMIT)(*FAIL)|(*ACCEPT), sep=),
x, perl=TRUE, ...)

ungrep(a|x, strings, value=TRUE)
# character(0)


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Rau, Roland
Thanks! (I have to admit, though, that I expected something simple)

Thanks,
Roland



-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
Sent: Sun 1/18/2009 8:54 PM
To: Rau, Roland
Cc: r-help@r-project.org
Subject: Re: [R] regex - negate a word
 
Try this:

grep(^([^a]|a[^b]|ab[^c])*.{0,2}$, x, perl = TRUE)


On Sun, Jan 18, 2009 at 2:37 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thank you very much to all of you for your fast and excellent help.
 Since the -grep(...) solution seems to be favored by most of the answers,
 I just wonder if there is really no regular expression which does the job?!?

 Thanks again,
 Roland



 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Sun 1/18/2009 8:28 PM
 To: Rau, Roland
 Cc: r-help@r-project.org
 Subject: Re: [R] regex - negate a word

 Try this:

 # indexes
 setdiff(seq_along(x), grep(abc, x))

 # values
 setdiff(x, grep(abc, x, value = TRUE))

 Another possibility is:

 z - abc
 x0 - c(x, z) # to handle no match case
 x0[- grep(z, x0)] # values




 On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should
 you receive a mail that is apparently from a MPI user without this text
 displayed, then the address has most likely been faked. If you are uncertain
 about the validity of this message, please check the mail header or ask your
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
Well, that's why it was only provided when you insisted.  This is
not what regexp's are good at.

On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thanks! (I have to admit, though, that I expected something simple)

 Thanks,
 Roland



 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Sun 1/18/2009 8:54 PM
 To: Rau, Roland
 Cc: r-help@r-project.org
 Subject: Re: [R] regex - negate a word

 Try this:

 grep(^([^a]|a[^b]|ab[^c])*.{0,2}$, x, perl = TRUE)


 On Sun, Jan 18, 2009 at 2:37 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Thank you very much to all of you for your fast and excellent help.
 Since the -grep(...) solution seems to be favored by most of the
 answers,
 I just wonder if there is really no regular expression which does the
 job?!?

 Thanks again,
 Roland



 -Original Message-
 From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
 Sent: Sun 1/18/2009 8:28 PM
 To: Rau, Roland
 Cc: r-help@r-project.org
 Subject: Re: [R] regex - negate a word

 Try this:

 # indexes
 setdiff(seq_along(x), grep(abc, x))

 # values
 setdiff(x, grep(abc, x, value = TRUE))

 Another possibility is:

 z - abc
 x0 - c(x, z) # to handle no match case
 x0[- grep(z, x0)] # values




 On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland r...@demogr.mpg.de wrote:
 Dear all,

 let's assume I have a vector of character strings:

 x - c(abcdef, defabc, qwerty)

 What I would like to find is the following: all elements where the word
 'abc' does not appear (i.e. 3 in this case of 'x').

 Since I am not really experienced with regular expressions, I started
 slowly and thought I find all word were 'abc' actually does appear:

 grep(pattern=abc, x=x)
 [1] 1 2

 So far, so good. Now I read that ^ is the negation operator. But it can
 also denote the beginning of a string as in:

 grep(pattern=^abc, x=x)
 [1] 1

 Of course, we need to put it inside square brackets to negate the
 expression [1]
 grep(pattern=[^abc], x=x)
 [1] 1 2 3

 But this is not what I want either.

 I'd appreciate any help. I assume this is rather easy and
 straightforward.

 Thanks,
 Roland


 [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
 caret) inside square brackets negates the expression

 --
 This mail has been sent through the MPI for Demographic Research.  Should
 you receive a mail that is apparently from a MPI user without this text
 displayed, then the address has most likely been faked. If you are
 uncertain
 about the validity of this message, please check the mail header or ask
 your
 system administrator for assistance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Rolf Turner


On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:


Well, that's why it was only provided when you insisted.  This is
not what regexp's are good at.

On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de  
wrote:

Thanks! (I have to admit, though, that I expected something simple)


It may not be what regexp's are good at, but the grep command in unix/ 
linux
does what is required *very* simply via the ``-v'' flag.  I  
conjecture that

it would not be difficult to add an argument with similar impact to the
grep() function in R.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
That's an entirely different point from whether regular expressions
can do it as grep -v is just another way to do it without using a regular
expression to specify the entire job.

On Sun, Jan 18, 2009 at 5:02 PM, Rolf Turner r.tur...@auckland.ac.nz wrote:

 On 19/01/2009, at 10:44 AM, Gabor Grothendieck wrote:

 Well, that's why it was only provided when you insisted.  This is
 not what regexp's are good at.

 On Sun, Jan 18, 2009 at 4:35 PM, Rau, Roland r...@demogr.mpg.de wrote:

 Thanks! (I have to admit, though, that I expected something simple)

 It may not be what regexp's are good at, but the grep command in unix/linux
 does what is required *very* simply via the ``-v'' flag.  I conjecture that
 it would not be difficult to add an argument with similar impact to the
 grep() function in R.

cheers,

Rolf Turner

 ##
 Attention:This e-mail message is privileged and confidential. If you are not
 theintended recipient please delete the message and notify the sender.Any
 views or opinions presented are solely those of the author.

 This e-mail has been scanned and cleared by
 MailMarshalwww.marshalsoftware.com
 ##


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Stavros Macrakis
On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk
waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 x - c(abcdef, defabc, qwerty)
 ...[find] all elements where the word 'abc' does not appear (i.e. 3 in this 
 case of 'x').

 x[-grep(abc, x)]
 which unfortunately fails if none of the strings in x matches the pattern, 
 i.e., grep returns integer(0);

Yes.

 arguably, x[integer(0)] should rather return all elements of x

The meaning of x[V] (for an integer subscript vector V) is: ignore 0
entries, and then:

a) if !(all(V0) | all(V0) ) = ERROR
b) if all (V0): length(x[V]) == length(V)
c) if all (V0): length(x[V]) == length(x)-length(unique(V))

When length(V)==0, the preconditions are true for both (b) and (c), so
the R design has made the decision that length(x[V]) == 0 in this
case.  If you're going to have the negative indices means exclusion
trick, this seems like a reasonable convention.

Of course, that means that you can't in general use x[-V] (where
all(V0)) to mean all elements that are not in V.  However, there is
a workaround if you have an upper bound on length(x):

   x[ c(-2^30, -V) ]

This guarantees at least one negative number.

   -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex - negate a word

2009-01-18 Thread Gabor Grothendieck
Note that the variation of this that I posted already handles that case.

On Sun, Jan 18, 2009 at 5:32 PM, Stavros Macrakis macra...@alum.mit.edu wrote:
 On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 x - c(abcdef, defabc, qwerty)
 ...[find] all elements where the word 'abc' does not appear (i.e. 3 in this 
 case of 'x').

 x[-grep(abc, x)]
 which unfortunately fails if none of the strings in x matches the pattern, 
 i.e., grep returns integer(0);

 Yes.

 arguably, x[integer(0)] should rather return all elements of x

 The meaning of x[V] (for an integer subscript vector V) is: ignore 0
 entries, and then:

 a) if !(all(V0) | all(V0) ) = ERROR
 b) if all (V0): length(x[V]) == length(V)
 c) if all (V0): length(x[V]) == length(x)-length(unique(V))

 When length(V)==0, the preconditions are true for both (b) and (c), so
 the R design has made the decision that length(x[V]) == 0 in this
 case.  If you're going to have the negative indices means exclusion
 trick, this seems like a reasonable convention.

 Of course, that means that you can't in general use x[-V] (where
 all(V0)) to mean all elements that are not in V.  However, there is
 a workaround if you have an upper bound on length(x):

   x[ c(-2^30, -V) ]

 This guarantees at least one negative number.

   -s

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.