Re: [casper] Dropped packets during HASHPIPE data acquisition

Mark Ruzindana Thu, 03 Dec 2020 10:16:40 -0800

Thanks for the suggestion David!

I was starting hashpipe in the debugger. I'll use gdb and the core file,
and let you know what I find. If I still can't figure out the problem, I
will send you a minimum non-working example. I definitely think it's some
sort of pointer arithmetic error as well, I just can't see it yet. I really
appreciate the help.


Thanks again,

Mark

On Thu, Dec 3, 2020 at 1:30 AM David MacMahon <[email protected]> wrote:

> Hi, Mark,
>
> Sorry to hear you're still getting a segfault.  It sounds like you made
> some progress with gdb, but the fact that you ended up with a different
> sort of error suggests that you were starting hashpipe in the debugger.  To
> debug your initial segfault problem, you can run hashpipe without the
> debugger, let it segfault and generate a core file, then use gdb and the
> core file (and hashpipe) to examine the state of the program when the
> segfault occurred.  The tricky part is getting the core file to be
> generated on a segfault.  You typically have to increase the core file size
> limit using "ulimit -c unlimited" and (because hashpipe is typically
> installed with the suid bit set) you have to let the kernel know it's OK to
> dump core files for suid programs using "sudo sysctl -w fs.suid_dumpable=1"
> (or maybe 2 if 1 doesn't quite do it).  You can read more about these steps
> with "help ulimit" (ulimit is a bash builtin) and "man 5 proc".
>
> Once you have the core file (typically named "core" but it may have a
> numeric extension from the PID of the crashing process) you can debug
> things with "gbd /path/to/hashpipe /path/to/core/file".  Note that the core
> file may be created with permissions that only let root read it, so you
> might have to "sudo chown a+r core" or similar to get read access to it.
> This starts the debugger in a a sort of forensic mode using the core file
> as a snapshot of the process and its memory space at the time of the
> segfault.  You can use "info threads" to see which threads existed, "thread
> N" to switch between threads (N is a thread number as shown by "info
> threads"), "bt" to see the function call backtrace fo the current thread,
> and "frame N" to switch to a specific frame in the function call
> backtrace.  Once you zero in on which part of your code was executing when
> the segfault occurred you can examine variables to see what exactly caused
> the segfault to occur.  You might find that the "interesting" or "relevant"
> variables have been optimized away, so you may want/need to recompile with
> a lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that
> from happening.
>
> Because this happens when you reach the end of your data buffer, I have to
> think it's a pointer arithmetic error of some sort.  If you can't figure
> out the problem from the core file, please create a "minimum working
> example" (well, in this case I guess a minimum non-working example),
> including a dummy packet generator script that creates suitable packets,
> and I'll see if I can recreate the problem.
>
> HTH,
> Dave
>
> On Nov 30, 2020, at 14:45, Mark Ruzindana <[email protected]> wrote:
>
> 'm currently using gdb to debug and it either tells me that I have a
> segmentation fault at the memcpy() in process_packet() or something very
> strange happens where the starting mcnt of a block greatly exceeds the mcnt
> corresponding to the packet being processed and there's no segmentation
> fault because the mcnt distance becomes negative so the memcpy() is
> skipped. Hopefully that wasn't too hard to track. Very strange problem that
> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
> get the same segmentation fault at the end of the circular buffer as
> mentioned above.
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpyphTtDGtJ%3DaremL1gB1atqGOPkDfKFJxR216TJZD5ivg%40mail.gmail.com.

Re: [casper] Dropped packets during HASHPIPE data acquisition

Reply via email to