Dear list,

I’m maintainsing a package containing only datasets (152): 
http://dutangc.free.fr/pub/RRepos/web/CASdatasets-index.html 
<http://dutangc.free.fr/pub/RRepos/web/CASdatasets-index.html> 

When R CMD checking the package, I get the following NOTE
* checking data for non-ASCII characters ... NOTE
 Note: found 4 marked UTF-8 strings

I wonder how to find which dataset(s) (all recorded as rda files) contain(s) 
non-ASCII characters. 

Using the iconv function let us to find or replace non-ASCII characters 
iconv(x, "UTF-8", "ASCII", sub="I_WAS_NOT_ASCII")

I use the following function to detect non-ASCII characters.

testASCII <- function(idata)
{
 col <- (1:NCOL(idata))[sapply(idata, is.factor)]
 col <- c(col, (1:NCOL(idata))[sapply(idata, is.character)])
 for(i in col)
 {
   x <- idata[, i]
   cat(colnames(idata)[i], "\n")
   res <- grep("I_WAS_NOT_ASCII", iconv(x, "latin1", "ASCII", 
sub="I_WAS_NOT_ASCII"))
   res <- c(res, grep("I_WAS_NOT_ASCII", iconv(x, "UTF-8", "ASCII", 
sub="I_WAS_NOT_ASCII")))
   if(any(length(res) > 0))
     cat(res, "\n")
 }
}

Unfortunately, I did not find yet which rda file contains non-ASCII characters 
among 56 most recent datasets. Is there a faster way to detect non-ASCII 
characters than to manually load and testASCII()? for example directly on rda 
files?

Any comment is welcome.

Regards, Christophe


> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
---------------------------------------
Christophe Dutang
LMM, UdM, Le Mans, France
web: http://dutangc.free.fr <http://dutangc.free.fr/>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to