Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-05 Thread Richard R. Liu
Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Richard R. Liu Sent: Tuesday, November 03, 2009 11:32 AM To: Uwe Ligges Cc: r-help@r-project.org Subject: Re: [R] R 2.10.0: Error in gsub/calloc

[R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread richard . liu
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this is a Mac-specific problem. I have a very large (158,908 possible sentences, ca. 58 MB) plain text document d which I am trying to tokenize: t - strapply(d, \\w+, perl = T). I am encountering the following error: Error in

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Uwe Ligges
richard@pueo-owl.ch wrote: I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this is a Mac-specific problem. I have a very large (158,908 possible sentences, ca. 58 MB) plain text document d which I am trying to tokenize: t - strapply(d, \\w+, perl = T). I am

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Kenneth Roy Cabrera Torres
Try the patch version... Maybe is the same problem I had with large database when using gsub() HTH El mar, 03-11-2009 a las 20:31 +0100, Richard R. Liu escribió: I apologize for not being clear. d is a character vector of length 158908. Each element in the vector has been designated by

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Richard R. Liu
Kenneth, Thanks for the hint. I downloaded and installed the latest patch, but to no avail. I can reproduce the error on a single sentence, the longest in the document. It contains 743,393 characters. It isn't a true sentence, but since it is more than three standard deviations

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Richard R. Liu
I apologize for not being clear. d is a character vector of length 158908. Each element in the vector has been designated by sentDetect (package: openNLP) as a sentence. Some of these are really sentences. Others are merely groups of meaningless characters separated by white space.

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Bert Gunter
Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard R. Liu Sent: Tuesday, November 03, 2009 11:32 AM To: Uwe Ligges Cc: r-help@r-project.org Subject: Re: [R] R 2.10.0: Error in gsub/calloc I apologize

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread William Dunlap
: Tuesday, November 03, 2009 3:00 PM To: Kenneth Roy Cabrera Torres Cc: r-help@r-project.org; Uwe Ligges Subject: Re: [R] R 2.10.0: Error in gsub/calloc Kenneth, Thanks for the hint. I downloaded and installed the latest patch, but to no avail. I can reproduce the error on a single

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Gabor Grothendieck
Note that you don't need perl = T since by default strapply uses tcl regular expressions and they support \w. What happens if you omit the perl = T? Also please specify the version of gsubfn you are using and if its not the latest then try it with the latest version. On Tue, Nov 3, 2009 at

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Prof Brian Ripley
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Richard R. Liu Sent: Tuesday, November 03, 2009 3:00 PM To: Kenneth Roy Cabrera Torres Cc: r-help@r-project.org; Uwe Ligges Subject: Re: [R] R 2.10.0: Error in gsub/calloc Kenneth

Re: [R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread Richard R. Liu
I am using gsubfn 0.5-0. When I do not specify perl = TRUE I now get the following error on the same document: Error in structure(.External(dotTcl, ..., PACKAGE = tcltk), class = tclObj) : [tcl] bad index 1e+05: must be integer?[+-]integer? or end? [+-]integer?. Regards, Richard On