[R] puzzle using gsub (and encodings maybe)
Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On Wed, 14 Oct 2009, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. We really do need the 'at a minimum' information we asked you for in the posting guide. But in cp1252 (a guess as to what you might be using) \xad is a 'soft hyphen', and that is not the same thing as a hyphen -- you will get the same issues with 'non-breaking space'. BDR Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Well, I see no hyphen at all here, but then I am not on Windows. It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
I get the same results (not working) using R 2.9.2 and R.10.0 beta. Thank you for looking at this. On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 2:16 PM, Adrian Dragulescu wrote: I get the same results (not working) using R 2.9.2 and R.10.0 beta. But it is working: the dash is an ad in x, not a 2d. You need to ask to substitute for the ad character, e.g. by spacelongdash - rawToChar(as.raw(c(0x20, 0xad))) gsub(spacelongdash, -, x) Duncan Murdoch Thank you for looking at this. On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:41 PM, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. Adrian I use R 2.8.1 on WinXP But that's ancient. Please try again with the beta of 2.10.0, and let us know if you still see a problem. Duncan Murdoch On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
Thank you. If I use gsub( \xad, -, x) [1] NEW YORK-NEW ENGLAND I get what I want. Adrian sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Wed, 14 Oct 2009, Prof Brian Ripley wrote: On Wed, 14 Oct 2009, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. We really do need the 'at a minimum' information we asked you for in the posting guide. But in cp1252 (a guess as to what you might be using) \xad is a 'soft hyphen', and that is not the same thing as a hyphen -- you will get the same issues with 'non-breaking space'. BDR Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Well, I see no hyphen at all here, but then I am not on Windows. It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzle using gsub (and encodings maybe)
On 10/14/2009 2:29 PM, Adrian Dragulescu wrote: Thank you. If I use gsub( \xad, -, x) [1] NEW YORK-NEW ENGLAND I get what I want. Right, that's simpler than what I suggested. Duncan Murdoch Adrian sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Wed, 14 Oct 2009, Prof Brian Ripley wrote: On Wed, 14 Oct 2009, Adrian Dragulescu wrote: charToRaw(x) [1] 4e 45 57 20 59 4f 52 4b 20 ad 4e 45 57 20 45 4e 47 4c 41 4e 44 charToRaw(y) [1] 4e 45 57 20 59 4f 52 4b 20 2d 4e 45 57 20 45 4e 47 4c 41 4e 44 So they are different. We really do need the 'at a minimum' information we asked you for in the posting guide. But in cp1252 (a guess as to what you might be using) \xad is a 'soft hyphen', and that is not the same thing as a hyphen -- you will get the same issues with 'non-breaking space'. BDR Adrian I use R 2.8.1 on WinXP On Wed, 14 Oct 2009, Duncan Murdoch wrote: On 10/14/2009 1:30 PM, Adrian Dragulescu wrote: Hello, Below is some output that shows my issue. I have a variable x that I read from a file (more on this below) x [1] NEW YORK NEW ENGLAND gsub( -, -, x)# this does not work! [1] NEW YORK NEW ENGLAND Well, I see no hyphen at all here, but then I am not on Windows. It looks as though it worked, presumably because something got lost in your email. Could you post charToRaw(x) so we can see what's in x? Duncan Murdoch Encoding(x) # is x in a special encoding? no [1] unknown y = NEW YORK -NEW ENGLAND # I type in variable y gsub( -, -, y)# and gsub works as expected [1] NEW YORK-NEW ENGLAND I'm sure the problem has to do with the way I read the variable x. But even if I change the encoding for x to ASCII, I still cannot do the sub. I get x by reading a pdf file with pdftotext so you will not be able to replicate my issue. Thanks for any suggestions, Adrian -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.