Hi, As mentioned previously, we've been noticing failures of tcpborphserver3 at a rate that has become annoying enough to finally track down. We compiled from the github source on the ROACH2 itself with debugging enabled and ran through gdb. The failure results are described below. The problem seems to occur during the starttap command. We'll forward along the raw katcp command we're using, but the curious thing is why "base" which comes from:
base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base; is pointing to invalid memory sometimes. Any ideas? Thanks, Glenn and Ray ---------- Forwarded message ---------- From: Ramon E. Creager <rcrea...@nrao.edu> Date: Tue, Jan 29, 2013 at 3:09 PM Subject: [Gbsapp] tcpborphserver3 failure in tg.c I've gotten the tcpborphserver to fail under the debugger, but because I don't yet understand the memory management used in this program I'm not yet sure what the problem is, so I'm putting this out in case someone who understands the tcpborphserver can help isolate the problem more quickly than I can. The segv occurs in tg.c, line 421. The gdb output is: Program received signal SIGSEGV, Segmentation fault. 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970 "\002\002\n\021") at tg.c:421 421 *((uint32_t *)(base + offset)) = value; (gdb) where #0 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970 "\002\002\n\021") at tg.c:421 #1 0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877 #2 0x1000ae68 in create_getap (d=0x107878b8, instance=0, name=0x10795da0 "gbe0", tap=0x10795d9b "tap0", ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6 "02:02:0A:11:00:41", period=10) at tg.c:1167 #3 0x1000b258 in insert_getap (d=0x107878b8, name=0x10795da0 "gbe0", tap=0x10795d9b "tap0", ip=0x10795da5 "10.17.0.65", port=60000, mac=0x10795db6 "02:02:0A:11:00:41", period=10) at tg.c:1230 #4 0x1000b514 in tap_start_cmd (d=0x107878b8, argc=6) at tg.c:1290 #5 0x100143bc in call_katcp (d=0x107878b8) at dispatch.c:879 #6 0x100145cc in dispatch_katcp (d=0x107878b8) at dispatch.c:951 #7 0x10018994 in run_shared_katcp (d=0x10782008) at shared.c:659 #8 0x1001cbe8 in run_core_loop_katcp (dl=0x10782008) at server.c:699 #9 0x1001d0c0 in run_config_server_katcp (dl=0x10782008, file=0x0, count=32, host=0x10047c90 "7147", port=0) at server.c:832 #10 0x10002034 in main (argc=3, argv=0xbff188f4) at main.c:196 (gdb) frame 1 #1 0x1000a140 in configure_fpga (gs=0x107b7928) at tg.c:877 877 if(write_mac_fpga(gs, GO_MAC, gs->s_mac_binary) < 0){ (gdb) frame 0 #0 0x100092d4 in write_mac_fpga (gs=0x107b7928, offset=0, mac=0x107b7970 "\002\002\n\021") at tg.c:421 421 *((uint32_t *)(base + offset)) = value; (gdb) list 416 417 value = ( 0x0 & 0xff000000) | 418 ( 0x0 & 0xff0000) | 419 ((mac[0] << 8) & 0xff00) | 420 (mac[1] & 0xff); 421 *((uint32_t *)(base + offset)) = value; 422 423 424 value = ((mac[2] << 24) & 0xff000000) | 425 ((mac[3] << 16) & 0xff0000) | (gdb) print base $1 = (void *) 0x1033fff (gdb) print offset $2 = 0 (gdb) print value $3 = 514 (gdb) 'base' is a void * which is set like this: base = gs->s_raw_mode->r_map + gs->s_register->e_pos_base; (back to gdb): (gdb) print *(gs->s_raw_mode) $12 = {r_registers = 0x10783d80, r_hwmon = 0x10783d90, r_fpga = 1, r_map = 0xffffffff, r_map_size = 33554432, r_image = 0x0, r_bof_dir = 0x10783da0 "/boffiles", r_top_register = 17314052, r_argc = 3, r_argv = 0xbff188f4, r_chassis = 0x107876e0, r_taps = 0x10785820, r_instances = 0} (gdb) print *(gs->s_register) $13 = {e_pos_base = 16990208, e_len_base = 16384, e_pos_offset = 0 '\000', e_len_offset = 0 '\000', e_mode = 3 '\003'} (gdb) I should add that 'base' is pointing to memory gdb says it cannot access (hence the segv): (gdb) print *(uint32_t *)base Cannot access memory at address 0x1033fff Ray