Hi,
As mentioned previously, we've been noticing failures of
tcpborphserver3 at a rate that has become annoying enough to finally
track down. We compiled from the github source on the ROACH2 itself
with debugging enabled and ran through gdb. The failure results are
described below. The problem seems to occur during the starttap
command. We'll forward along the raw katcp command we're using, but
the curious thing is why "base" which comes from:

base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base;

is pointing to invalid memory sometimes.

Any ideas?
Thanks,
Glenn and Ray

---------- Forwarded message ----------
From: Ramon E. Creager <rcrea...@nrao.edu>
Date: Tue, Jan 29, 2013 at 3:09 PM
Subject: [Gbsapp] tcpborphserver3 failure in tg.c


I've gotten the tcpborphserver to fail under the debugger, but because I
don't yet understand the memory management used in this program I'm not
yet sure what the problem is, so I'm putting this out in case someone
who understands the tcpborphserver can help isolate the problem more
quickly than I can.  The segv occurs in tg.c, line 421.  The gdb output is:

Program received signal SIGSEGV, Segmentation fault.
0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970
"\002\002\n\021") at tg.c:421
421       *((uint32_t *)(base + offset)) = value;
(gdb) where
#0  0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0,
mac=0x107b7970 "\002\002\n\021") at tg.c:421
#1  0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877
#2  0x1000ae68 in create_getap (d=0x107878b8, instance=0,
name=0x10795da0 "gbe0", tap=0x10795d9b "tap0",
    ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6
"02:02:0A:11:00:41", period=10) at tg.c:1167
#3  0x1000b258 in insert_getap (d=0x107878b8, name=0x10795da0 "gbe0",
tap=0x10795d9b "tap0",
    ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6
"02:02:0A:11:00:41", period=10) at tg.c:1230
#4  0x1000b514 in tap_start_cmd (d=0x107878b8, argc=6) at tg.c:1290
#5  0x100143bc in call_katcp (d=0x107878b8) at dispatch.c:879
#6  0x100145cc in dispatch_katcp (d=0x107878b8) at dispatch.c:951
#7  0x10018994 in run_shared_katcp (d=0x10782008) at shared.c:659
#8  0x1001cbe8 in run_core_loop_katcp (dl=0x10782008) at server.c:699
#9  0x1001d0c0 in run_config_server_katcp (dl=0x10782008, file=0x0,
count=32, host=0x10047c90 "7147", port=0)
    at server.c:832
#10 0x10002034 in main (argc=3, argv=0xbff188f4) at main.c:196
(gdb) frame 1
#1  0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877
877       if(write_mac_fpga(gs, GO_MAC, gs->s_mac_binary) < 0){
(gdb) frame 0
#0  0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0,
mac=0x107b7970 "\002\002\n\021") at tg.c:421
421       *((uint32_t *)(base + offset)) = value;
(gdb) list
416
417       value = (   0x0         & 0xff000000) |
418               (   0x0         & 0xff0000) |
419               ((mac[0] <<  8) & 0xff00) |
420                (mac[1]        & 0xff);
421       *((uint32_t *)(base + offset)) = value;
422
423
424       value = ((mac[2] << 24) & 0xff000000) |
425               ((mac[3] << 16) & 0xff0000) |
(gdb) print base
$1 = (void *) 0x1033fff
(gdb) print offset
$2 = 0
(gdb) print value
$3 = 514
(gdb)

'base' is a void * which is set like this:
 base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base;
(back to gdb):

(gdb) print *(gs->s_raw_mode)
$12 = {r_registers = 0x10783d80, r_hwmon = 0x10783d90, r_fpga = 1, r_map
= 0xffffffff, r_map_size = 33554432,
  r_image = 0x0, r_bof_dir = 0x10783da0 "/boffiles", r_top_register =
17314052, r_argc = 3,
  r_argv = 0xbff188f4, r_chassis = 0x107876e0, r_taps = 0x10785820,
r_instances = 0}
(gdb) print *(gs->s_register)
$13 = {e_pos_base = 16990208, e_len_base = 16384, e_pos_offset = 0
'\000', e_len_offset = 0 '\000',
  e_mode = 3 '\003'}
(gdb)


I should add that 'base' is pointing to memory gdb says it cannot access
(hence the segv):

(gdb) print *(uint32_t *)base
Cannot access memory at address 0x1033fff


Ray

Reply via email to