Not likely the factor issue:

x <- factor(c("MT2342",    "MT0982",    "MT2874"))

> x
[1] MT2342 MT0982 MT2874
Levels: MT0982 MT2342 MT2874

> gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"


gsub() and friends coerce to character internally already:

> gsub
function (pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
    perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
    if (!is.character(x))
        x <- as.character(x)
    .Internal(gsub(as.character(pattern), as.character(replacement),
        x, ignore.case, extended, perl, fixed, useBytes))
}
<environment: namespace:base>



More than likely what is happening is that 'PthwyGenes' is a single row data frame:

x <- data.frame(A = "MT2342", B = "MT0982", C = "MT2874")

> x
       A      B      C
1 MT2342 MT0982 MT2874

> str(x)
'data.frame':   1 obs. of  3 variables:
 $ A: Factor w/ 1 level "MT2342": 1
 $ B: Factor w/ 1 level "MT0982": 1
 $ C: Factor w/ 1 level "MT2874": 1


Thus, when the code for gsub() attempts to coerce 'x' to character, as per documented behavior, you get the factor level numeric codes coerced to character:

> as.character(x[1, ])
[1] "1" "1" "1"


and then:


> gsub("[^0-9]", "", x[1, ])
[1] "1" "1" "1"


Thus, instead use:

> sapply(x[1, ], function(x) gsub("[^0-9]", "", x))
     A      B      C
"2342" "0982" "2874"


or, if you just need the vector returned and not a data frame:


> gsub("[^0-9]", "", unlist(x[1, ]))
[1] "2342" "0982" "2874"


The key thing to remember is that a single extracted row in a data frame is not a vector.

HTH,

Marc Schwartz


on 07/02/2008 10:51 AM jim holtman wrote:
Seems to work fine for me:

x <- c("MT2342",    "MT0982",    "MT2874")
gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"

You might have 'factors' so you should use as.character to convert to
character strings:

gsub('[^0-9]','',as.character(PthwyGenes))

On Wed, Jul 2, 2008 at 10:24 AM,  <[EMAIL PROTECTED]> wrote:
Hi,

I have a data frame with strings that have two letters and four numbers. When I
store a whole row as a new vector and try to remove the preceding letters using
the gsub command, it returns characters of single numbers that have no relation
to the numbers in each string. I also noticed that when I view the new vector
before using gsub, it includes the original headers from the data frame. For
example,

The original row will contain (i'm not showing the headers):

MT2342    MT0982    MT2874

and after I use the command, 'gsub('[^0-9]','',PthwyGenes),' I get:

"6"    "6"    "8"

and this result no longer has any headers.

Does anyone know why this happens and how I can fix it?

Thanks,
-Nina

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to