Cheers, Luke!
Indeed, the bug could not be caught both using the address sanitizer
approach pointed out by Henrik nor valgrind in this particular case.
But adding the watchpoint to R_LogicalNAValue worked!
I now have the call stack of the offending functions sample() and
simulate(). For anyone else who finds the need to do this, do *not*
specify the SEXP datatype when setting your watchpoint as suggested here
on StackOverflow (https://stackoverflow.com/a/31202563). My guess is
specifying the datatype perhaps causes swapping between the addresses
spanned by the large SEXP object to the very limited (typically ~8
registers) for trapping the memory write. I mention this "wrong way" in
my steps below. To Martin's point, a bit of work was needed to reduce
to a reliable minimum test case which was turned out to be:
devtools::install()
### In the package "./tests/" directory containing "testthat.R"
### run a new R session e.g. `R -f minimum_script.R` with the lines:
library(tsshmm)
testthat::test_dir("testthat", package = "tsshmm",
filter = "train|tss")
Thank you all for your invaluable help,
Pariksheet
Steps to get the call stack:
1) Compile R from source and invoke the debugger:
R --version | head -1
# R version 4.1.1 (2021-08-10) -- "Kick Things"
cd /tmp
wget https://cloud.r-project.org/src/base/R-4/R-4.1.1.tar.gz
tar -xf R-4.1.1.tar.gz
mkdir build-R
cd build-R/
../R-4.1.1/configure --silent --without-x CFLAGS="-g -O2 -pipe"
make -j # -j with C++ will murder your RAM, but C is frugal.
./bin/R --debugger=gdb
2) Now that you're in gdb, here's how you set the watchpoint and get the
back trace:
(gdb) break memory.c:Rf_InitMemory
# Breakpoint 1 at 0x177710: file ../../../R-4.1.1/src/main/memory.c,
line 2136.
(gdb) run
# ...
# Breakpoint 1, Rf_InitMemory () at
../../../R-4.1.1/src/main/memory.c:2136
# 2136 init_gctorture();
(gdb) clear memory.c:Rf_InitMemory
# Deleted breakpoint 1
(gdb) print R_LogicalNAValue
# $1 = (SEXP) 0x0
(gdb) watch R_LogicalNAValue
# Hardware watchpoint 2: R_LogicalNAValue
(gdb) continue
# Hardware watchpoint 2: R_LogicalNAValue
#
# Old value = (SEXP) 0x0
# New value = (SEXP) 0x555555a18140
# Rf_InitMemory () at ../../../R-4.1.1/src/main/memory.c:2227
# 2227 LOGICAL(R_LogicalNAValue)[0] = NA_LOGICAL;
(gdb) info watchpoints
# Num Type Disp Enb Address What
# 2 hw watchpoint keep y R_LogicalNAValue
# breakpoint already hit 1 time
(gdb) delete 2
(gdb) info watchpoints
# No watchpoints.
(gdb) print R_LogicalNAValue
# $2 = (SEXP) 0x555555a18140
(gdb) watch *(SEXP) 0x555555a18140 # <- *** THIS IS VERY SLOW ***
Watchpoint 3: *(SEXP) 0x555555a18140
(gdb) delete 3
(gdb) watch * 0x555555a18140 # <- *** Better ***
# Watchpoint 4: * 0x555555a18140
(gdb) disable
(gdb) continue
...
> setwd("~/src/tsshmm/tests/")
> library(tsshmm)
...
> ^C
# Program received signal SIGINT, Interrupt.
# 0x00007ffff7038ff7 in __GI___select (nfds=nfds@entry=1,
readfds=readfds@entry=0x5555559f7780 <readMask.14775>,
writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0,
# timeout=timeout@entry=0x0) at
../sysdeps/unix/sysv/linux/select.c:41
# 41 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) enable
(gdb) continue
> testthat::test_dir("testthat", package = "tsshmm", filter =
"train|tss")
# ...
# ✔ | F W S OK | Context
# ⠏ | 0 | train
# Thread 1 "R" hit Hardware watchpoint 4: * 0x555555a18140
# Old value = 822083626
# New value = 805306410
# RunGenCollect (size_needed=<optimized out>) at
../../../R-4.1.1/src/main/memory.c:1687
# 1687 s = next;
(gdb) backtrace
#0 RunGenCollect (size_needed=<optimized out>) at
../../../R-4.1.1/src/main/memory.c:1687
#1 R_gc_internal (size_needed=<optimized out>) at
../../../R-4.1.1/src/main/memory.c:3129
#2 0x00005555556cab98 in Rf_allocVector3 (type=<optimized out>,
type@entry=13, length=length@entry=626, allocator=allocator@entry=0x0)
at ../../../R-4.1.1/src/main/memory.c:2775
#3 0x00005555555c14e0 in Rf_allocVector (length=626, type=13) at
../../../R-4.1.1/src/include/Rinlinedfuns.h:595
#4 PutRNGstate () at ../../../R-4.1.1/src/main/RNG.c:444
#5 0x00007ffff44cac5e in sample (model=model@entry=0x555557da5fa0,
n=3, from=from@entry=0, f=f@entry=0x7ffff44cabf0 <prob_emis>) at
simulate.c:23
#6 0x00007ffff44cad8b in simulate (model=0x555557da5fa0,
obs=<optimized out>, ncol=ncol@entry=100, nrow=nrow@entry=10000) at
simulate.c:101
#7 0x00007ffff44ca651 in C_simulate (obs=0x7fffed59c010,
dim=<optimized out>, trans=<optimized out>, emis=<optimized out>,
emis_tied=<optimized out>, start=<optimized out>)
at R_wrap_tsshmm.c:98
...
On 10/13/21 8:27 AM, luke-tier...@uiowa.edu wrote:
*Message sent from a system outside of UConn.*
The most likely culprit is C code that is modifying a logical vector
without checking whether this is legitimate for R semantics
(i.e. making sure MAYBE_REFERENCED or at least MAYBE_SHARED is FALSE).
If that is the case, then this is legitimate for C code to do in
principle, so UBSAN and valgrind won't help. You need to set a gdb
watchpoint on the location, catch where it is modified, and look up
the call stack from there.
The error signaled in the GC is a sanity check for catching that this
sort of misbehavior has happened in C code. But it is a check after
the fact; it can't tell you more that that the problem happened
sometime before it was detected.
Best,
luke
On Wed, 13 Oct 2021, Martin Morgan wrote:
The problem with using gdb is you'd find yourself in the garbage
collector, but perhaps quite removed from where the corruption
occurred, e.g., gc() might / will likely be triggered after you've
returned to the top-level evaluation loop, and the part of your code
that did the corruption might be off the stack.
The problem with devtools::check() (and R CMD check) is that running
the unit tests occurs in a separate process, so things like setting a
global option (and even system variable from within R) may not be
visible in the process doing the check. Conversely, for the same
reasons, it seems like the problem can be tickled by running the tests
alone. So
R -f <your_source>/tests/testthat.R
would seem to be a good enough starting point.
Actually, I liked Henrik's UBSAN suggestion, which requires the least
amount of work. I think I'd then try
R -d valgrind -f <your_source>/tests/testthat.R
and then further into the weeds... actually from the section of R-exts
you mention
R_C_BOUNDS_CHECK=yes R -f <your_source>/tests/testthat.R
might also be promising.
Martin
On 10/12/21, 10:30 PM, "Bioc-devel on behalf of Pariksheet Nanda"
<bioc-devel-boun...@r-project.org on behalf of
pariksheet.na...@uconn.edu> wrote:
Hi all,
On 10/12/21 6:43 PM, Pariksheet Nanda wrote:
>
> Error in `...`: internal logical NA value has been modified
In the R source code, this error is in src/main/memory.c so I was
thinking one way of investigating might be to run `R --debugger gdb`,
then running R to load the symbols and either:
1) set a breakpoint for when it reaches that particular line in
memory.c:R_gc_internal and then walk up the stack,
2) or set a watch point on memory.c:R_gc_internal:R_LogicalNAValue
(somehow; having trouble getting gdb to reach that context).
3) Then I thought, maybe this is getting far into the weeds and
instead
I could check the most common C related error by enabling bounds
checking of my C arrays per section 4.4 of the R-exts manual:
$ R -q
> options(CBoundsCheck = TRUE)
> Sys.setenv(R_C_BOUNDS_CHECK = "yes") # Try both ways *shrug*
> devtools::test()
... # All tests still pass.
> devtools::check()
... # No change :(
Maybe I'm not sure I'm using that option correctly? Or the option is
ignored in devtools::check(). Or indeed, the error is not from over
running C array boundaries.
It turns out that using the precompiled debug symbols[1] isn't all
that
useful here because I don't get line numbers in gdb without the source
files and many symbols are optimized out, so it looks like I would
need
to compile R from source with -ggdb first instead of using the Debian
packages.
Hopefully this is still the right approach?
Pariksheet
[1] After install r-base-core-dbg on Debian for the debug symbols.
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel