Re: [R] Stack overflow in R 2.10.0 with sub() and gsub() SOLVED!

2009-10-28 Thread Kenneth Roy Cabrera Torres
Thanks to Dr. Ripley and Dr. Murdoch for the workaround
and the solution to the problem with sub() and gsub() memory problem.

Now, with the perl=TRUE option added it works perfect (with the full
database)!

alumnos$AL_NUME_ID-gsub((^ +)|( +$),,alumnos$AL_NUME_ID,perl=TRUE)

I am going to test it with the patched version, that seems to work
without this addition, but acording to Dr. Ripley with
this option it is faster even in the patched version.

Thank you very much for your help.

Kenneth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Duncan Murdoch

On 10/27/2009 8:15 AM, Kenneth Roy Cabrera Torres wrote:

Hi R developers:

Congratulations for the new R 2.10.0 version.

It is a huge effort! Thank you for your work and dedication.

I just want to ask how to make this strip blank function
to work again (it works on R.2.9.2).

alumnos$AL_NUME_ID-sub((^ +)|( +$),,alumnos$AL_NUME_ID),)

alumnos is a data base with 900.000 rows and 72 columns.
and alumnos$AL_NUME_ID is a character variable read form
a mysql database.

The system shows me this message:

Error: C produce desborde de pila en 'segfault'

It seems a stack overflow problem, but it works on R 2.9.2!

Thank you for your help, and again, thank you for your work!!!


I just tried that (after fixing the typo at the end of the line) and it 
worked on these vectors:


x - c(a,  a, a ,  a )
y - rep(x, 90)

So there is something about your dataset that is causing the problem. 
Can you narrow it down?  Here are some tests:


1.  Check that it is the value that is causing the problem, not the 
manner of getting it:


x - alumnos$AL_NUME_ID
y - sub((^ +)|( +$),,x)

2.  See if it is in the first half of the data:

x - alumnos$AL_NUME_ID
x - x[seq_len(length(x)/2)]
y - sub((^ +)|( +$),,x)

3.  See if it is in the second half:

x - alumnos$AL_NUME_ID
x - x[-seq_len(length(x)/2)]
y - sub((^ +)|( +$),,x)

If you can narrow it down to a particularly short vector that causes the 
error, that would be very helpful.  It's likely to be somewhat tedious, 
because I imagine those segfaults will terminate R; I'd suggest using 
save.image() a lot when things are working, so you can restart after a 
crash.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Kenneth Roy Cabrera Torres
Dr. Murdoch:

I am puzzled!
As you adviced me I do this:

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
y - gsub((^ +)|( +$),,x)

And it fails,

But, trying to locate the problem I do:

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
x - x[seq_len(length(x)/2)]
y - gsub((^ +)|( +$),,x)

works

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
x - x[-seq_len(length(x)/2)]
y - gsub((^ +)|( +$),,x)

works

Now, both works!!!

So, I am puzzle!!! I cannot locate the problem.
Thank you for your advice.

Kenneth

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Duncan Murdoch

On 10/27/2009 10:46 AM, Kenneth Roy Cabrera Torres wrote:

Dr. Murdoch:

I am puzzled!
As you adviced me I do this:

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]



Please try the following.  After doing the lines above, do

save(x, file=x.RData)

and exit from R.  Then restart R, and run

load(x.RData)
y - gsub((^ +)|( +$),,x)

If it still fails, that's a sign that there's a problem in that vector 
of values; if not, it's likely some sort of memory problem that will be 
harder to track down.  In the former case you could email me the file 
and if it also fails here I can probably track it down.  If it's a 
memory problem, I can try, but I'm less optimistic that I'll find it.


Duncan Murdoch




And it fails,

But, trying to locate the problem I do:

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
x - x[seq_len(length(x)/2)]
y - gsub((^ +)|( +$),,x)

works

x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
x - x[-seq_len(length(x)/2)]
y - gsub((^ +)|( +$),,x)

works

Now, both works!!!

So, I am puzzle!!! I cannot locate the problem.
Thank you for your advice.






Kenneth






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Duncan Murdoch

On 10/27/2009 1:05 PM, Kenneth Roy Cabrera Torres wrote:

Thank you very much for your interest.

I make this:
x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
save(x, file=x.RData)

I exit form R, and then restart R and I make this:

load(x.RData)
y - gsub((^ +)|( +$),,x)

It shows me:

Error en gsub((^ +)|( +$), , x) : 
  input string 66644 is invalid in this locale


I'm working on Windows, so I don't have the locale problem, but I do get 
the segfault.  I'll see if I can track down what is going wrong.


Duncan Murdoch


I delete that string (it is a string with a non usual character (Ñ))

So, I retype without that observation.

y - gsub((^ +)|( +$),,x[-c(66644)])

I got this:
Error en gsub((^ +)|( +$), , x[-c(66644)]) : 
  input string 160689 is invalid in this locale


I retype again with this invalid string this way (I use the
  160690 position, because the lag of the x vector)


y - gsub((^ +)|( +$),,x[-c(66644,160690)])

Error: C produce desborde de pila en 'segfault'

And it fails.

I also repeat all the process with this conversion first.

x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8)
x - x[-seq_len(length(x)/2)]
save(x, file=x.RData)

And I exit, and restart R, and then I type

load(x.RData)
y - gsub((^ +)|( +$),,x)

And it fails again without showing me the invalid string errors.

I then make this:

 load(x.RData)
 y - gsub((^ +)|( +$),,x[1:160690])

and it works, then I type

 y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035)

and it works...

But I start to make a manual binary search,
I found something that stills puzzle me.

y - gsub((^ +)|( +$),,x[1:261570])

works, but sometimes fails (after I restart R),
it always fails with index greather than 262000.

I see that there are not something inusual arround 261570.

x[261560:261580]
 [1] 21444777  1147585   255202522

 [4] 25852100  24258550  A8D0251207

 [7] 34681811  19121345  16921329

[10] 20442195  14506482  44332211

[13] 35049122  34326340  35182366

[16] 33288742  34958795  1017147202

[19] 3306985   33048501  33295073


I am sending you the x.Rdata file to see if you can
reproduce my problem.

This infomation may be useful:

 sessionInfo()

R version 2.10.0 (2009-10-26) 
x86_64-unknown-linux-gnu 


locale:
 [1] LC_CTYPE=es_CO.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=es_CO.UTF-8   
 [7] LC_PAPER=es_CO.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C   


attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 


 R.Version()

$platform
[1] x86_64-unknown-linux-gnu
$arch
[1] x86_64
$os
[1] linux-gnu
$system
[1] x86_64, linux-gnu
$status
[1] 
$major
[1] 2
$minor
[1] 10.0
$year
[1] 2009
$month
[1] 10
$day
[1] 26
$`svn rev`
[1] 50208
$language
[1] R
$version.string
[1] R version 2.10.0 (2009-10-26)

gcc --version and g++ --verision shows me:

gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
Copyright (C) 2008 Free Software Foundation, Inc.
Esto es software libre; vea el código para las condiciones de copia.  NO
hay
garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO
EN
PARTICULAR

When I compile R I use this option in configuration (nothing more)

./configure --enable-R-shlib
make 
sudo make install


At the moment I have 22Gb of swap partition (keeping monitor tracking
the systems is not using it) and 4GB of RAM.

Again, thank you very much for your help.

Kenneth







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Kenneth Roy Cabrera Torres
Thank you very much for your interest.

I make this:
x - as.character(alumnos$AL_NUME_ID)
x - x[-seq_len(length(x)/2)]
save(x, file=x.RData)

I exit form R, and then restart R and I make this:

load(x.RData)
y - gsub((^ +)|( +$),,x)

It shows me:

Error en gsub((^ +)|( +$), , x) : 
  input string 66644 is invalid in this locale

I delete that string (it is a string with a non usual character (Ñ))

So, I retype without that observation.

y - gsub((^ +)|( +$),,x[-c(66644)])

I got this:
Error en gsub((^ +)|( +$), , x[-c(66644)]) : 
  input string 160689 is invalid in this locale

I retype again with this invalid string this way (I use the
  160690 position, because the lag of the x vector)

 y - gsub((^ +)|( +$),,x[-c(66644,160690)])
Error: C produce desborde de pila en 'segfault'

And it fails.

I also repeat all the process with this conversion first.

x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8)
x - x[-seq_len(length(x)/2)]
save(x, file=x.RData)

And I exit, and restart R, and then I type

load(x.RData)
y - gsub((^ +)|( +$),,x)

And it fails again without showing me the invalid string errors.

I then make this:

 load(x.RData)
 y - gsub((^ +)|( +$),,x[1:160690])

and it works, then I type

 y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035)

and it works...

But I start to make a manual binary search,
I found something that stills puzzle me.

y - gsub((^ +)|( +$),,x[1:261570])

works, but sometimes fails (after I restart R),
it always fails with index greather than 262000.

I see that there are not something inusual arround 261570.

x[261560:261580]
 [1] 21444777  1147585   255202522

 [4] 25852100  24258550  A8D0251207

 [7] 34681811  19121345  16921329

[10] 20442195  14506482  44332211

[13] 35049122  34326340  35182366

[16] 33288742  34958795  1017147202

[19] 3306985   33048501  33295073


I am sending you the x.Rdata file to see if you can
reproduce my problem.

This infomation may be useful:

 sessionInfo()

R version 2.10.0 (2009-10-26) 
x86_64-unknown-linux-gnu 

locale:
 [1] LC_CTYPE=es_CO.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=es_CO.UTF-8   
 [7] LC_PAPER=es_CO.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

 R.Version()

$platform
[1] x86_64-unknown-linux-gnu
$arch
[1] x86_64
$os
[1] linux-gnu
$system
[1] x86_64, linux-gnu
$status
[1] 
$major
[1] 2
$minor
[1] 10.0
$year
[1] 2009
$month
[1] 10
$day
[1] 26
$`svn rev`
[1] 50208
$language
[1] R
$version.string
[1] R version 2.10.0 (2009-10-26)

gcc --version and g++ --verision shows me:

gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
Copyright (C) 2008 Free Software Foundation, Inc.
Esto es software libre; vea el código para las condiciones de copia.  NO
hay
garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO
EN
PARTICULAR

When I compile R I use this option in configuration (nothing more)

./configure --enable-R-shlib
make 
sudo make install

At the moment I have 22Gb of swap partition (keeping monitor tracking
the systems is not using it) and 4GB of RAM.

Again, thank you very much for your help.

Kenneth





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack overflow in R 2.10.0 with sub()

2009-10-27 Thread Kenneth Roy Cabrera Torres
El mar, 27-10-2009 a las 10:47 -0700, Phil Spector escribió:
 What happens if you type
 
 Sys.setlocale('LC_ALL','C')
 
 before using gsub or grep?

When I do that, R hangs and  don't show any message.
 
   - Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
 
 
 On Tue, 27 Oct 2009, Kenneth Roy Cabrera Torres wrote:
 
  Thank you very much for your interest.
 
  I make this:
  x - as.character(alumnos$AL_NUME_ID)
  x - x[-seq_len(length(x)/2)]
  save(x, file=x.RData)
 
  I exit form R, and then restart R and I make this:
 
  load(x.RData)
  y - gsub((^ +)|( +$),,x)
 
  It shows me:
 
  Error en gsub((^ +)|( +$), , x) :
   input string 66644 is invalid in this locale
 
  I delete that string (it is a string with a non usual character (Ñ))
 
  So, I retype without that observation.
 
  y - gsub((^ +)|( +$),,x[-c(66644)])
 
  I got this:
  Error en gsub((^ +)|( +$), , x[-c(66644)]) :
   input string 160689 is invalid in this locale
 
  I retype again with this invalid string this way (I use the
   160690 position, because the lag of the x vector)
 
  y - gsub((^ +)|( +$),,x[-c(66644,160690)])
  Error: C produce desborde de pila en 'segfault'
 
  And it fails.
 
  I also repeat all the process with this conversion first.
 
  x - iconv(as.character(alumnos$AL_NUME_ID),latin1,UTF-8)
  x - x[-seq_len(length(x)/2)]
  save(x, file=x.RData)
 
  And I exit, and restart R, and then I type
 
  load(x.RData)
  y - gsub((^ +)|( +$),,x)
 
  And it fails again without showing me the invalid string errors.
 
  I then make this:
 
  load(x.RData)
  y - gsub((^ +)|( +$),,x[1:160690])
 
  and it works, then I type
 
  y - gsub((^ +)|( +$),,x[1:20]) #(x length is 454035)
 
  and it works...
 
  But I start to make a manual binary search,
  I found something that stills puzzle me.
 
  y - gsub((^ +)|( +$),,x[1:261570])
 
  works, but sometimes fails (after I restart R),
  it always fails with index greather than 262000.
 
  I see that there are not something inusual arround 261570.
 
  x[261560:261580]
  [1] 21444777  1147585   255202522
  
  [4] 25852100  24258550  A8D0251207
  
  [7] 34681811  19121345  16921329
  
  [10] 20442195  14506482  44332211
  
  [13] 35049122  34326340  35182366
  
  [16] 33288742  34958795  1017147202
  
  [19] 3306985   33048501  33295073
  
 
  I am sending you the x.Rdata file to see if you can
  reproduce my problem.
 
  This infomation may be useful:
 
  sessionInfo()
 
  R version 2.10.0 (2009-10-26)
  x86_64-unknown-linux-gnu
 
  locale:
  [1] LC_CTYPE=es_CO.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=es_CO.UTF-8LC_COLLATE=es_CO.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=es_CO.UTF-8
  [7] LC_PAPER=es_CO.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  R.Version()
 
  $platform
  [1] x86_64-unknown-linux-gnu
  $arch
  [1] x86_64
  $os
  [1] linux-gnu
  $system
  [1] x86_64, linux-gnu
  $status
  [1] 
  $major
  [1] 2
  $minor
  [1] 10.0
  $year
  [1] 2009
  $month
  [1] 10
  $day
  [1] 26
  $`svn rev`
  [1] 50208
  $language
  [1] R
  $version.string
  [1] R version 2.10.0 (2009-10-26)
 
  gcc --version and g++ --verision shows me:
 
  gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
  Copyright (C) 2008 Free Software Foundation, Inc.
  Esto es software libre; vea el código para las condiciones de copia.  NO
  hay
  garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO
  EN
  PARTICULAR
 
  When I compile R I use this option in configuration (nothing more)
 
  ./configure --enable-R-shlib
  make
  sudo make install
 
  At the moment I have 22Gb of swap partition (keeping monitor tracking
  the systems is not using it) and 4GB of RAM.
 
  Again, thank you very much for your help.
 
  Kenneth
 
 
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.