Hi,

kirito has left the following comment at Identify and Fix ANY bug that causes a BRL-CAD tool to crash #4 https://www.google-melange.com/gci/task/view/google/gci2014/4533992846000128:


detail of crah


In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. The OS kernel will in response usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process (a program crash), and sometimes a core dump.

reproduce segfault

capture core dumps:

$ ulimit -c unlimited


Then run program. It will generate a core file

Then use gdb:

$ gdb ./program core


And gdb will load and run a backtrace to see exactly what operation elicited the segfault.


The default action for a segmentation fault or bus error is abnormal termination of the process that triggered it. A core file may be generated to aid debugging, and other platform-dependent actions may also be performed. For example, Linux systems using the grsecurity patch may log SIGSEGV signals in order to monitor for possible intrusion attempts using buffer overflows.

Writing off the end of the array

Generally, if you're writing off the bounds of an array, then the line that caused the segfault in the first place should be an array access. (There are a few times when this won't actually be the case -- notably, if the fact that you wrote off an array causes the stack to be smashed -- basically, overwriting the pointer that stores where to return after the function completes.)

Of course, sometimes, you won't actually cause a segfault writing off the end of the array. Instead, you might just notice that some of your variable values are changing periodically and unexpectedly. This is a tough bug to crack; one option is to set up your debugger to watch a variable for changes and run your program until the variable's value changes. Your debugger will break on that instruction, and you can poke around to figure out if that behavior is unexpected.

(gdb) watch [variable name]
Hardware watchpoint 1: [variable name]
(gdb) continue
...
Hardware watchpoint 1: [variable name]

Old value = [value1]
New value = [value2]


This approach can get tricky when you're dealing with a lot of dynamically allocated memory and it's not entirely clear what you should watch. To simplify things, use simple test cases, keep working with the same inputs, and turn off randomized seeds if you're using random numbers!

Stack Overflows

A stack overflow isn't the same type of pointer-related problem as the others. In this case, you don't need to have a single explicit pointer in your program; you just need a recursive function without a base case. Nevertheless, this is a tutorial about segmentation faults, and on some systems, a stack overflow will be reported as a segmentation fault. (This makes sense because running out of memory on the stack will violate memory segmentation.)

To diagnose a stack overflow in GDB, typically you just need to do a backtrace:

(gdb) backtrace
#0  foo() () at t.cpp:5
#1  0x08048404 in foo() () at t.cpp:5
#2  0x08048404 in foo() () at t.cpp:5
#3  0x08048404 in foo() () at t.cpp:5
[...]
#20 0x08048404 in foo() () at t.cpp:5
#21 0x08048404 in foo() () at t.cpp:5
#22 0x08048404 in foo() () at t.cpp:5
---Type  to continue, or q  to quit---


If you find a single function call piling up an awfully large number of times, this is a good indication of a stack overflow.

Typically, you need to analyze your recursive function to make sure that all the base cases (the cases in which the function should not call itself) are covered correctly. For instance, in computing the factorial function

int factorial(int n)
{
    // What about n < 0?
    if(n == 0)
    {
        return 1;
    }
    return factorial(n-1) * n;
}


In this case, the base case of n being zero is covered, but what about n < 0? On "valid" inputs, the function will work fine, but not on "invalid" inputs like -1.

You also have to make sure that your base case is reachable. Even if you have the correct base case, if you don't correctly progress toward the base case, your function will never terminate.

int factorial(int n)
{
    if(n <= 0)
    {
        return 1;
    }
    // Ooops, we forgot to subtract 1 from n
    return factorial(n) * n;
}


Greetings,
The Google Open Source Programs Team


---
You are receiving this message because you are subscribed to Identify and Fix ANY bug that causes a BRL-CAD tool to crash #4. To stop receiving these messages, go to: https://www.google-melange.com/gci/task/view/google/gci2014/4533992846000128.

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
vanity: www.gigenet.com
_______________________________________________
BRL-CAD Tracker mailing list
brlcad-tracker@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-tracker

Reply via email to