Dear R-helpers,

I have encountered the following unexpected behaviour of R-2.10.1, but not
on both RHEL 4 and Ubuntu Karmic (precompiled via synaptic or built from

I have a character vector from which I want to extract a certain pattern
that is surrounded
by junk as in:

> nn <- sprintf("junk_%02d_junk", 1:2)
> nn
[1] "junk_01_junk" "junk_02_junk"

> sub("^.*([[:digit:]]{2}).*$", "\\1", nn)
[1] "nk" "nk"
# oops? however:

> sub("^.*([[:digit:]]{2}).*$", "\\1", nn, perl = TRUE)
[1] "01" "02"

# as expected, and also

> Sys.setlocale("LC_ALL", "C")
> sub("^.*([[:digit:]]{2}).*$", "\\1", nn)
[1] "01" "02"

Is there something wrong with my regex syntax or am I missing something
Obviously I have at least two workarounds but I'd like to report this since
it is
breaking code that used to run in R-2.9.0.

Thanks in advance for any help or insight,

 - axel

$ R --vanilla

R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> sessionInfo()
R version 2.10.1 (2009-12-14)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil /

The information of this email and in any file transmitted with it is strictly 
confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended recipient, 
any copying, distribution or any other use of this email is prohibited and may 
be unlawful. In such case, you should please notify the sender immediately and 
destroy this email.
The content of this email is not legally binding unless confirmed by letter.
Any views expressed in this message are those of the individual sender, except 
where the message states otherwise and the sender is authorised to state them 
to be the views of the sender's company. For further information about Actelion 
please see our website at

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Reply via email to