Re: [R] Stack overflow in R 2.10.0 with sub() and gsub() SOLVED!
Thanks to Dr. Ripley and Dr. Murdoch for the workaround and the solution to the problem with sub() and gsub() memory problem. Now, with the perl=TRUE option added it works perfect (with the full database)! alumnos$AL_NUME_ID-gsub((^ +)|( +$),,alumnos$AL_NUME_ID,perl=TRUE) I am going to test it with the patched version, that seems to work without this addition, but acording to Dr. Ripley with this option it is faster even in the patched version. Thank you very much for your help. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
On 10/27/2009 8:15 AM, Kenneth Roy Cabrera Torres wrote: Hi R developers: Congratulations for the new R 2.10.0 version. It is a huge effort! Thank you for your work and dedication. I just want to ask how to make this strip blank function to work again (it works on R.2.9.2). alumnos$AL_NUME_ID-sub((^ +)|( +$),,alumnos$AL_NUME_ID),) alumnos is a data base with 900.000 rows and 72 columns. and alumnos$AL_NUME_ID is a character variable read form a mysql database. The system shows me this message: Error: C produce desborde de pila en 'segfault' It seems a stack overflow problem, but it works on R 2.9.2! Thank you for your help, and again, thank you for your work!!! I just tried that (after fixing the typo at the end of the line) and it worked on these vectors: x - c(a, a, a , a ) y - rep(x, 90) So there is something about your dataset that is causing the problem. Can you narrow it down? Here are some tests: 1. Check that it is the value that is causing the problem, not the manner of getting it: x - alumnos$AL_NUME_ID y - sub((^ +)|( +$),,x) 2. See if it is in the first half of the data: x - alumnos$AL_NUME_ID x - x[seq_len(length(x)/2)] y - sub((^ +)|( +$),,x) 3. See if it is in the second half: x - alumnos$AL_NUME_ID x - x[-seq_len(length(x)/2)] y - sub((^ +)|( +$),,x) If you can narrow it down to a particularly short vector that causes the error, that would be very helpful. It's likely to be somewhat tedious, because I imagine those segfaults will terminate R; I'd suggest using save.image() a lot when things are working, so you can restart after a crash. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
Dr. Murdoch: I am puzzled! As you adviced me I do this: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] y - gsub((^ +)|( +$),,x) And it fails, But, trying to locate the problem I do: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] x - x[seq_len(length(x)/2)] y - gsub((^ +)|( +$),,x) works x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] x - x[-seq_len(length(x)/2)] y - gsub((^ +)|( +$),,x) works Now, both works!!! So, I am puzzle!!! I cannot locate the problem. Thank you for your advice. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
On 10/27/2009 10:46 AM, Kenneth Roy Cabrera Torres wrote: Dr. Murdoch: I am puzzled! As you adviced me I do this: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] Please try the following. After doing the lines above, do save(x, file=x.RData) and exit from R. Then restart R, and run load(x.RData) y - gsub((^ +)|( +$),,x) If it still fails, that's a sign that there's a problem in that vector of values; if not, it's likely some sort of memory problem that will be harder to track down. In the former case you could email me the file and if it also fails here I can probably track it down. If it's a memory problem, I can try, but I'm less optimistic that I'll find it. Duncan Murdoch And it fails, But, trying to locate the problem I do: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] x - x[seq_len(length(x)/2)] y - gsub((^ +)|( +$),,x) works x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] x - x[-seq_len(length(x)/2)] y - gsub((^ +)|( +$),,x) works Now, both works!!! So, I am puzzle!!! I cannot locate the problem. Thank you for your advice. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
On 10/27/2009 1:05 PM, Kenneth Roy Cabrera Torres wrote: Thank you very much for your interest. I make this: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) I exit form R, and then restart R and I make this: load(x.RData) y - gsub((^ +)|( +$),,x) It shows me: Error en gsub((^ +)|( +$), , x) : input string 66644 is invalid in this locale I'm working on Windows, so I don't have the locale problem, but I do get the segfault. I'll see if I can track down what is going wrong. Duncan Murdoch I delete that string (it is a string with a non usual character (Ñ)) So, I retype without that observation. y - gsub((^ +)|( +$),,x[-c(66644)]) I got this: Error en gsub((^ +)|( +$), , x[-c(66644)]) : input string 160689 is invalid in this locale I retype again with this invalid string this way (I use the 160690 position, because the lag of the x vector) y - gsub((^ +)|( +$),,x[-c(66644,160690)]) Error: C produce desborde de pila en 'segfault' And it fails. I also repeat all the process with this conversion first. x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) And I exit, and restart R, and then I type load(x.RData) y - gsub((^ +)|( +$),,x) And it fails again without showing me the invalid string errors. I then make this: load(x.RData) y - gsub((^ +)|( +$),,x[1:160690]) and it works, then I type y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035) and it works... But I start to make a manual binary search, I found something that stills puzzle me. y - gsub((^ +)|( +$),,x[1:261570]) works, but sometimes fails (after I restart R), it always fails with index greather than 262000. I see that there are not something inusual arround 261570. x[261560:261580] [1] 21444777 1147585 255202522 [4] 25852100 24258550 A8D0251207 [7] 34681811 19121345 16921329 [10] 20442195 14506482 44332211 [13] 35049122 34326340 35182366 [16] 33288742 34958795 1017147202 [19] 3306985 33048501 33295073 I am sending you the x.Rdata file to see if you can reproduce my problem. This infomation may be useful: sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=es_CO.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=es_CO.UTF-8 [7] LC_PAPER=es_CO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R.Version() $platform [1] x86_64-unknown-linux-gnu $arch [1] x86_64 $os [1] linux-gnu $system [1] x86_64, linux-gnu $status [1] $major [1] 2 $minor [1] 10.0 $year [1] 2009 $month [1] 10 $day [1] 26 $`svn rev` [1] 50208 $language [1] R $version.string [1] R version 2.10.0 (2009-10-26) gcc --version and g++ --verision shows me: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3 Copyright (C) 2008 Free Software Foundation, Inc. Esto es software libre; vea el código para las condiciones de copia. NO hay garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO EN PARTICULAR When I compile R I use this option in configuration (nothing more) ./configure --enable-R-shlib make sudo make install At the moment I have 22Gb of swap partition (keeping monitor tracking the systems is not using it) and 4GB of RAM. Again, thank you very much for your help. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
Thank you very much for your interest. I make this: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) I exit form R, and then restart R and I make this: load(x.RData) y - gsub((^ +)|( +$),,x) It shows me: Error en gsub((^ +)|( +$), , x) : input string 66644 is invalid in this locale I delete that string (it is a string with a non usual character (Ñ)) So, I retype without that observation. y - gsub((^ +)|( +$),,x[-c(66644)]) I got this: Error en gsub((^ +)|( +$), , x[-c(66644)]) : input string 160689 is invalid in this locale I retype again with this invalid string this way (I use the 160690 position, because the lag of the x vector) y - gsub((^ +)|( +$),,x[-c(66644,160690)]) Error: C produce desborde de pila en 'segfault' And it fails. I also repeat all the process with this conversion first. x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) And I exit, and restart R, and then I type load(x.RData) y - gsub((^ +)|( +$),,x) And it fails again without showing me the invalid string errors. I then make this: load(x.RData) y - gsub((^ +)|( +$),,x[1:160690]) and it works, then I type y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035) and it works... But I start to make a manual binary search, I found something that stills puzzle me. y - gsub((^ +)|( +$),,x[1:261570]) works, but sometimes fails (after I restart R), it always fails with index greather than 262000. I see that there are not something inusual arround 261570. x[261560:261580] [1] 21444777 1147585 255202522 [4] 25852100 24258550 A8D0251207 [7] 34681811 19121345 16921329 [10] 20442195 14506482 44332211 [13] 35049122 34326340 35182366 [16] 33288742 34958795 1017147202 [19] 3306985 33048501 33295073 I am sending you the x.Rdata file to see if you can reproduce my problem. This infomation may be useful: sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=es_CO.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=es_CO.UTF-8 [7] LC_PAPER=es_CO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R.Version() $platform [1] x86_64-unknown-linux-gnu $arch [1] x86_64 $os [1] linux-gnu $system [1] x86_64, linux-gnu $status [1] $major [1] 2 $minor [1] 10.0 $year [1] 2009 $month [1] 10 $day [1] 26 $`svn rev` [1] 50208 $language [1] R $version.string [1] R version 2.10.0 (2009-10-26) gcc --version and g++ --verision shows me: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3 Copyright (C) 2008 Free Software Foundation, Inc. Esto es software libre; vea el código para las condiciones de copia. NO hay garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO EN PARTICULAR When I compile R I use this option in configuration (nothing more) ./configure --enable-R-shlib make sudo make install At the moment I have 22Gb of swap partition (keeping monitor tracking the systems is not using it) and 4GB of RAM. Again, thank you very much for your help. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack overflow in R 2.10.0 with sub()
El mar, 27-10-2009 a las 10:47 -0700, Phil Spector escribió: What happens if you type Sys.setlocale('LC_ALL','C') before using gsub or grep? When I do that, R hangs and don't show any message. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 27 Oct 2009, Kenneth Roy Cabrera Torres wrote: Thank you very much for your interest. I make this: x - as.character(alumnos$AL_NUME_ID) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) I exit form R, and then restart R and I make this: load(x.RData) y - gsub((^ +)|( +$),,x) It shows me: Error en gsub((^ +)|( +$), , x) : input string 66644 is invalid in this locale I delete that string (it is a string with a non usual character (Ñ)) So, I retype without that observation. y - gsub((^ +)|( +$),,x[-c(66644)]) I got this: Error en gsub((^ +)|( +$), , x[-c(66644)]) : input string 160689 is invalid in this locale I retype again with this invalid string this way (I use the 160690 position, because the lag of the x vector) y - gsub((^ +)|( +$),,x[-c(66644,160690)]) Error: C produce desborde de pila en 'segfault' And it fails. I also repeat all the process with this conversion first. x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8) x - x[-seq_len(length(x)/2)] save(x, file=x.RData) And I exit, and restart R, and then I type load(x.RData) y - gsub((^ +)|( +$),,x) And it fails again without showing me the invalid string errors. I then make this: load(x.RData) y - gsub((^ +)|( +$),,x[1:160690]) and it works, then I type y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035) and it works... But I start to make a manual binary search, I found something that stills puzzle me. y - gsub((^ +)|( +$),,x[1:261570]) works, but sometimes fails (after I restart R), it always fails with index greather than 262000. I see that there are not something inusual arround 261570. x[261560:261580] [1] 21444777 1147585 255202522 [4] 25852100 24258550 A8D0251207 [7] 34681811 19121345 16921329 [10] 20442195 14506482 44332211 [13] 35049122 34326340 35182366 [16] 33288742 34958795 1017147202 [19] 3306985 33048501 33295073 I am sending you the x.Rdata file to see if you can reproduce my problem. This infomation may be useful: sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=es_CO.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=es_CO.UTF-8 [7] LC_PAPER=es_CO.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base R.Version() $platform [1] x86_64-unknown-linux-gnu $arch [1] x86_64 $os [1] linux-gnu $system [1] x86_64, linux-gnu $status [1] $major [1] 2 $minor [1] 10.0 $year [1] 2009 $month [1] 10 $day [1] 26 $`svn rev` [1] 50208 $language [1] R $version.string [1] R version 2.10.0 (2009-10-26) gcc --version and g++ --verision shows me: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3 Copyright (C) 2008 Free Software Foundation, Inc. Esto es software libre; vea el código para las condiciones de copia. NO hay garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO EN PARTICULAR When I compile R I use this option in configuration (nothing more) ./configure --enable-R-shlib make sudo make install At the moment I have 22Gb of swap partition (keeping monitor tracking the systems is not using it) and 4GB of RAM. Again, thank you very much for your help. Kenneth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.