Re: [casper] Dropped packets during HASHPIPE data acquisition

Mark Ruzindana Mon, 25 May 2020 17:16:01 -0700

Thanks for the additional suggestions. I will try those and let you know
what happens.


Mark

On Mon, May 25, 2020 at 6:07 PM David MacMahon <[email protected]> wrote:

> A few more suggestions:
>
> 1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and
> for suid executables there's an extra step related to
> /proc/sys/fs/suid_dumpable.  See "man 5 core" and "man 5 proc" for
> details.  Once you have a core file, you can use gdb to examine the state
> of things when the segfault happened.  You might want to recompile your
> plug-in with debugging enabled and fewer optimizations to get the most out
> of this approach: "gdb /path/to/hashpipe /path/to/core".  (Gotta love how
> it's still called "core"!).  gdb can be a bit cryptic, but it's also very
> powerful.
>
> 2) Another idea, just for diagnostic purposes, is to omit the "+
> input_databuf_idx(...)" part of the dest_p assignment.  That will write all
> payloads to the first part of the data block, so not buffer overflow for
> sure (assuming idx is in range :)).  It's just a way to eliminate a
> variable.
>
> 3) Make sure the packet socket blocks are large enough for the packet
> frames.  I agree it looks like you're not reading past the end of the
> packet payload size, but maybe the payload itself goes beyond the end of
> the packet socket blocks?  The kernel might silently truncate the packets
> in that case.
>
> 4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.
> It sounds like that's not happening because you're seeing the expected
> size, but it's worth mentioning for mail archive completeness.
>
> 5) You can use hashpipe_dump_databuf to examine the 159 payloads you were
> able copy before the segfault to see whether every byte is properly
> positioned and has believable values.  You could change memcpy(..) to
> memset(p_dest, 'X', PKT_UDP_SIZE(frame)-16) so you'll know the exact value
> that every byte should have. Instead of 'X' you could use pkt_num+1 (i.e. a
> 1-based packet counter) so you'll know which bytes correspond to which
> packets.  Using memset() would also eliminate reading from the packet
> socket blocks (another variable gone).
>
> Happy hunting,
> Dave
>
> On May 25, 2020, at 16:33, Mark Ruzindana <[email protected]> wrote:
>
> Thanks for the suggestions. I neglected to mention that I'm printing out
> the PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into
> account the 8 byte UDP header and the size and port are correct. When
> performing the memcpy(), I am taking into account that PKT_UDP_DATA()
> returns a pointer of the payload and excludes the UDP header. However, I
> also have an 8 byte packet header within that payload (this gives me the
> mcnt, f-engine, and x-engine indices) and I exclude it when performing the
> memcpy(). This is what it looks like:
>
> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0);
> // This macro index shifts every mcnt and f-engine index
> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore
> packet header
>
> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
> bytes) and packet header (8 bytes)
>
> I will look into the other possible issues that you suggested, but as far
> as I can tell, it doesn't seem like there should be a segfault given what
> I'm doing before that memcpy(). I will let you know what else I find.
>
> Thanks again, I really appreciate the help.
>
> Mark
>
> On Mon, May 25, 2020 at 4:30 PM David MacMahon <[email protected]>
> wrote:
>
>> Hi, Mark,
>>
>> Sounds like progress!
>>
>> On May 25, 2020, at 13:56, Mark Ruzindana <[email protected]> wrote:
>>
>> I have been able to capture data with the first round of frames of the
>> circular buffer i.e. if I have 160 frames, I am able to capture packets of
>> frames 0 to 159 at which point right at the memcpy() in the
>> process_packet() function of the net thread, I get a segmentation fault.
>>
>>
>> The fact that you get a the segfault right at the memcpy of the final
>> frame of the ring buffer suggests that there is problem with the parameters
>> passed to memcpy.  Most likely src+length-1 exceeds the end of the frame so
>> you get a segfault when memcpy tries to read from beyond the allocated
>> memory.  This would explain why it segfaults on the final frame and not the
>> previous frames because reading beyond a previous frame still reads from
>> "legal" (though incorrect) memory locations.  It's also possible that the
>> segfault happens due to a bad address on the destination side of the
>> memcpy(), but unless the destination buffer is also 160 frames in size that
>> seems less likely.
>>
>> The release_frame function is not likely to be a culprit here unless the
>> pointer you are passing it differs from the pointer that the pktsock_recv
>> function returned.
>>
>> For debugging, I suggest logging dst, src, len before calling memcpy.
>> Normally you wouldn't generate a log message for every packet because that
>> would ruin your throughput, but since you know it's going to crash after
>> the first 160 packets there's not much throughout to ruin. :)
>>
>> One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to
>> the UDP payload of the packet, but PKT_UDP_SIZE() evaluates to the total
>> UDP size (i.e. 8 bytes for the UDP header plus the length of the UDP
>> payload).  Passing PKT_UDP_SIZE() as "len" to memcpy without subtracting 8
>> for the header bytes is not correct and could potentially cause this
>> problem.
>>
>> HTH,
>> Dave
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "[email protected]" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu
>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu?utm_medium=email&utm_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/723417E3-C630-4988-84B8-F4F3171DB47E%40berkeley.edu
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/723417E3-C630-4988-84B8-F4F3171DB47E%40berkeley.edu?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpy2YkNOywYgL__gWQupedq%2BVKz-%2BoepWEf9zXDwwxVtig%40mail.gmail.com.

Re: [casper] Dropped packets during HASHPIPE data acquisition

Reply via email to