The katcp command was:

?tap-start tap0 gbe0 10.17.0.65 60000 02:02:0A:11:00:41

On Tue, Jan 29, 2013 at 3:42 PM, G Jones <[email protected]> wrote:
> Hi,
> As mentioned previously, we've been noticing failures of
> tcpborphserver3 at a rate that has become annoying enough to finally
> track down. We compiled from the github source on the ROACH2 itself
> with debugging enabled and ran through gdb. The failure results are
> described below. The problem seems to occur during the starttap
> command. We'll forward along the raw katcp command we're using, but
> the curious thing is why "base" which comes from:
>
> base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base;
>
> is pointing to invalid memory sometimes.
>
> Any ideas?
> Thanks,
> Glenn and Ray
>
> ---------- Forwarded message ----------
> From: Ramon E. Creager <[email protected]>
> Date: Tue, Jan 29, 2013 at 3:09 PM
> Subject: [Gbsapp] tcpborphserver3 failure in tg.c
>
>
> I've gotten the tcpborphserver to fail under the debugger, but because I
> don't yet understand the memory management used in this program I'm not
> yet sure what the problem is, so I'm putting this out in case someone
> who understands the tcpborphserver can help isolate the problem more
> quickly than I can.  The segv occurs in tg.c, line 421.  The gdb output is:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970
> "\002\002\n\021") at tg.c:421
> 421       *((uint32_t *)(base + offset)) = value;
> (gdb) where
> #0  0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0,
> mac=0x107b7970 "\002\002\n\021") at tg.c:421
> #1  0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877
> #2  0x1000ae68 in create_getap (d=0x107878b8, instance=0,
> name=0x10795da0 "gbe0", tap=0x10795d9b "tap0",
>     ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6
> "02:02:0A:11:00:41", period=10) at tg.c:1167
> #3  0x1000b258 in insert_getap (d=0x107878b8, name=0x10795da0 "gbe0",
> tap=0x10795d9b "tap0",
>     ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6
> "02:02:0A:11:00:41", period=10) at tg.c:1230
> #4  0x1000b514 in tap_start_cmd (d=0x107878b8, argc=6) at tg.c:1290
> #5  0x100143bc in call_katcp (d=0x107878b8) at dispatch.c:879
> #6  0x100145cc in dispatch_katcp (d=0x107878b8) at dispatch.c:951
> #7  0x10018994 in run_shared_katcp (d=0x10782008) at shared.c:659
> #8  0x1001cbe8 in run_core_loop_katcp (dl=0x10782008) at server.c:699
> #9  0x1001d0c0 in run_config_server_katcp (dl=0x10782008, file=0x0,
> count=32, host=0x10047c90 "7147", port=0)
>     at server.c:832
> #10 0x10002034 in main (argc=3, argv=0xbff188f4) at main.c:196
> (gdb) frame 1
> #1  0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877
> 877       if(write_mac_fpga(gs, GO_MAC, gs->s_mac_binary) < 0){
> (gdb) frame 0
> #0  0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0,
> mac=0x107b7970 "\002\002\n\021") at tg.c:421
> 421       *((uint32_t *)(base + offset)) = value;
> (gdb) list
> 416
> 417       value = (   0x0         & 0xff000000) |
> 418               (   0x0         & 0xff0000) |
> 419               ((mac[0] <<  8) & 0xff00) |
> 420                (mac[1]        & 0xff);
> 421       *((uint32_t *)(base + offset)) = value;
> 422
> 423
> 424       value = ((mac[2] << 24) & 0xff000000) |
> 425               ((mac[3] << 16) & 0xff0000) |
> (gdb) print base
> $1 = (void *) 0x1033fff
> (gdb) print offset
> $2 = 0
> (gdb) print value
> $3 = 514
> (gdb)
>
> 'base' is a void * which is set like this:
>  base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base;
> (back to gdb):
>
> (gdb) print *(gs->s_raw_mode)
> $12 = {r_registers = 0x10783d80, r_hwmon = 0x10783d90, r_fpga = 1, r_map
> = 0xffffffff, r_map_size = 33554432,
>   r_image = 0x0, r_bof_dir = 0x10783da0 "/boffiles", r_top_register =
> 17314052, r_argc = 3,
>   r_argv = 0xbff188f4, r_chassis = 0x107876e0, r_taps = 0x10785820,
> r_instances = 0}
> (gdb) print *(gs->s_register)
> $13 = {e_pos_base = 16990208, e_len_base = 16384, e_pos_offset = 0
> '\000', e_len_offset = 0 '\000',
>   e_mode = 3 '\003'}
> (gdb)
>
>
> I should add that 'base' is pointing to memory gdb says it cannot access
> (hence the segv):
>
> (gdb) print *(uint32_t *)base
> Cannot access memory at address 0x1033fff
>
>
> Ray

Reply via email to