Note that there are also regexp classes that define certain character sets, most notably [:graph:] , which can make it easy to create appropriate regexps. More is in ?regex .
Martin Maechler <maechler <at> stat.math.ethz.ch> writes: : : >>>>> "Spencer" == Spencer Graves <spencer.graves <at> pdf.com> : >>>>> on Thu, 14 Oct 2004 13:41:24 -0700 writes: : : Spencer> It looks like you have several non-printing : Spencer> characters. "nchar" will give you the total number : Spencer> of characters in each character string. : : Spencer> "strsplit" can break character strings into single : Spencer> characters, and "%in%" can be used to classify : Spencer> them. : : and you give nice coding examples: : : Spencer> Consider the following: : >> x <- "Draszt 0%/1ÃÂÃÂiso8859-15ÃÂ" : >> nx <- nchar(x) : >> x. <- strsplit(x, "") : >> length(x.[[1]]) : Spencer> [1] 29 : >> : >> namechars <- c(letters, LETTERS, as.character(0:9), ".") : : just to be precise: If 'namechars' is supposed to mean : ``characters valid in R object names'', then you should have : added "_" as well: : : namechars <- c(letters, LETTERS, as.character(0:9), ".", "_") : : >> punctuation <- c(",", "!", "+", "*", "&", "|") : >> legalchars <- c(namechars, punctuation) : : and 'legalchars' would have to contain quite a bit more I : presume, e.g. "$", " <at> ", .... : (but that wouldn't have been a reason to write this e-mail..) : : >> legalx <- lapply(x., function(y)(y %in% legalchars)) : >> x.[[1]][!legalx[[1]]] : Spencer> [1] " " "" "%" "/" "Ã" "Â" "Ã" "Â" "-" "" "Ã" "Â" : >> : >> sapply(legalx, sum) : Spencer> [1] 17 : : Spencer> Will this give you ideas about what to do what you want? : Spencer> hope this helps. spencer graves : : (and this too) : : Martin Maechler, ETH Zurich : : : Spencer> Gabor Grothendieck wrote: : : >> Assuming that the problem is that your input file has : >> additional embedded characters added by the data base : >> program you could try extracting just the text using : >> the UNIX strings program: : >> : >> strings myfile.csv > myfile.txt : >> : >> and see if myfile.txt works with R and if not check out : >> what the differences are between it and the .csv file. : >> : >> Date: Thu, 14 Oct 2004 11:31:33 -0700 : >> From: Scott Waichler <scott.waichler <at> pnl.gov> : >> To: <r-help <at> stat.math.ethz.ch> : >> Subject: [R] Problem with number characters : >> : >> : >> I am trying to process text fields scanned in from a csv file that is : >> output from the Windows database program FileMakerPro. The characters : >> onscreen look like regular text, but R does not like their underlying binary form. : >> For example, one of text fields contains a name and a number, but : >> R recognizes the number as something other than what it appears : >> to be in plain text. The character string "Draszt 03" after being : >> read into R using scan and ="" becomes "Draszt 03" where the 3 is : >> displayed in my R session as a superscript. Here is the result pasted : >> into this email I'm composing in emacs: "Draszt 0%/1ÃÂÃÂiso8859- 15ÃÂ" : >> Another clue for the knowledgable: when I try to display the vector element : >> causing trouble, I get : >> <CHARSXP: "Draszt 0%/1ÃÂÃÂiso8859-15ÃÂ"> : >> where again the superscipt part is just "3" in my R session. I'm working in : >> Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated. : >> : >> Scott Waichler : >> Pacific Northwest National Laboratory : >> scott.waichler <at> pnl.gov : : ______________________________________________ : R-help <at> stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : : ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html