The katcp command was: ?tap-start tap0 gbe0 10.17.0.65 60000 02:02:0A:11:00:41
On Tue, Jan 29, 2013 at 3:42 PM, G Jones <[email protected]> wrote: > Hi, > As mentioned previously, we've been noticing failures of > tcpborphserver3 at a rate that has become annoying enough to finally > track down. We compiled from the github source on the ROACH2 itself > with debugging enabled and ran through gdb. The failure results are > described below. The problem seems to occur during the starttap > command. We'll forward along the raw katcp command we're using, but > the curious thing is why "base" which comes from: > > base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base; > > is pointing to invalid memory sometimes. > > Any ideas? > Thanks, > Glenn and Ray > > ---------- Forwarded message ---------- > From: Ramon E. Creager <[email protected]> > Date: Tue, Jan 29, 2013 at 3:09 PM > Subject: [Gbsapp] tcpborphserver3 failure in tg.c > > > I've gotten the tcpborphserver to fail under the debugger, but because I > don't yet understand the memory management used in this program I'm not > yet sure what the problem is, so I'm putting this out in case someone > who understands the tcpborphserver can help isolate the problem more > quickly than I can. The segv occurs in tg.c, line 421. The gdb output is: > > Program received signal SIGSEGV, Segmentation fault. > 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970 > "\002\002\n\021") at tg.c:421 > 421 *((uint32_t *)(base + offset)) = value; > (gdb) where > #0 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, > mac=0x107b7970 "\002\002\n\021") at tg.c:421 > #1 0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877 > #2 0x1000ae68 in create_getap (d=0x107878b8, instance=0, > name=0x10795da0 "gbe0", tap=0x10795d9b "tap0", > ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6 > "02:02:0A:11:00:41", period=10) at tg.c:1167 > #3 0x1000b258 in insert_getap (d=0x107878b8, name=0x10795da0 "gbe0", > tap=0x10795d9b "tap0", > ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6 > "02:02:0A:11:00:41", period=10) at tg.c:1230 > #4 0x1000b514 in tap_start_cmd (d=0x107878b8, argc=6) at tg.c:1290 > #5 0x100143bc in call_katcp (d=0x107878b8) at dispatch.c:879 > #6 0x100145cc in dispatch_katcp (d=0x107878b8) at dispatch.c:951 > #7 0x10018994 in run_shared_katcp (d=0x10782008) at shared.c:659 > #8 0x1001cbe8 in run_core_loop_katcp (dl=0x10782008) at server.c:699 > #9 0x1001d0c0 in run_config_server_katcp (dl=0x10782008, file=0x0, > count=32, host=0x10047c90 "7147", port=0) > at server.c:832 > #10 0x10002034 in main (argc=3, argv=0xbff188f4) at main.c:196 > (gdb) frame 1 > #1 0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877 > 877 if(write_mac_fpga(gs, GO_MAC, gs->s_mac_binary) < 0){ > (gdb) frame 0 > #0 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, > mac=0x107b7970 "\002\002\n\021") at tg.c:421 > 421 *((uint32_t *)(base + offset)) = value; > (gdb) list > 416 > 417 value = ( 0x0 & 0xff000000) | > 418 ( 0x0 & 0xff0000) | > 419 ((mac[0] << 8) & 0xff00) | > 420 (mac[1] & 0xff); > 421 *((uint32_t *)(base + offset)) = value; > 422 > 423 > 424 value = ((mac[2] << 24) & 0xff000000) | > 425 ((mac[3] << 16) & 0xff0000) | > (gdb) print base > $1 = (void *) 0x1033fff > (gdb) print offset > $2 = 0 > (gdb) print value > $3 = 514 > (gdb) > > 'base' is a void * which is set like this: > base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base; > (back to gdb): > > (gdb) print *(gs->s_raw_mode) > $12 = {r_registers = 0x10783d80, r_hwmon = 0x10783d90, r_fpga = 1, r_map > = 0xffffffff, r_map_size = 33554432, > r_image = 0x0, r_bof_dir = 0x10783da0 "/boffiles", r_top_register = > 17314052, r_argc = 3, > r_argv = 0xbff188f4, r_chassis = 0x107876e0, r_taps = 0x10785820, > r_instances = 0} > (gdb) print *(gs->s_register) > $13 = {e_pos_base = 16990208, e_len_base = 16384, e_pos_offset = 0 > '\000', e_len_offset = 0 '\000', > e_mode = 3 '\003'} > (gdb) > > > I should add that 'base' is pointing to memory gdb says it cannot access > (hence the segv): > > (gdb) print *(uint32_t *)base > Cannot access memory at address 0x1033fff > > > Ray

