Not likely the factor issue:
x <- factor(c("MT2342", "MT0982", "MT2874"))
> x
[1] MT2342 MT0982 MT2874
Levels: MT0982 MT2342 MT2874
> gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"
gsub() and friends coerce to character internally already:
> gsub
function (pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
if (!is.character(x))
x <- as.character(x)
.Internal(gsub(as.character(pattern), as.character(replacement),
x, ignore.case, extended, perl, fixed, useBytes))
}
<environment: namespace:base>
More than likely what is happening is that 'PthwyGenes' is a single row
data frame:
x <- data.frame(A = "MT2342", B = "MT0982", C = "MT2874")
> x
A B C
1 MT2342 MT0982 MT2874
> str(x)
'data.frame': 1 obs. of 3 variables:
$ A: Factor w/ 1 level "MT2342": 1
$ B: Factor w/ 1 level "MT0982": 1
$ C: Factor w/ 1 level "MT2874": 1
Thus, when the code for gsub() attempts to coerce 'x' to character, as
per documented behavior, you get the factor level numeric codes coerced
to character:
> as.character(x[1, ])
[1] "1" "1" "1"
and then:
> gsub("[^0-9]", "", x[1, ])
[1] "1" "1" "1"
Thus, instead use:
> sapply(x[1, ], function(x) gsub("[^0-9]", "", x))
A B C
"2342" "0982" "2874"
or, if you just need the vector returned and not a data frame:
> gsub("[^0-9]", "", unlist(x[1, ]))
[1] "2342" "0982" "2874"
The key thing to remember is that a single extracted row in a data frame
is not a vector.
HTH,
Marc Schwartz
on 07/02/2008 10:51 AM jim holtman wrote:
Seems to work fine for me:
x <- c("MT2342", "MT0982", "MT2874")
gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"
You might have 'factors' so you should use as.character to convert to
character strings:
gsub('[^0-9]','',as.character(PthwyGenes))
On Wed, Jul 2, 2008 at 10:24 AM, <[EMAIL PROTECTED]> wrote:
Hi,
I have a data frame with strings that have two letters and four numbers. When I
store a whole row as a new vector and try to remove the preceding letters using
the gsub command, it returns characters of single numbers that have no relation
to the numbers in each string. I also noticed that when I view the new vector
before using gsub, it includes the original headers from the data frame. For
example,
The original row will contain (i'm not showing the headers):
MT2342 MT0982 MT2874
and after I use the command, 'gsub('[^0-9]','',PthwyGenes),' I get:
"6" "6" "8"
and this result no longer has any headers.
Does anyone know why this happens and how I can fix it?
Thanks,
-Nina
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.