Hi, Ross,
> On Dec 16, 2020, at 22:08, Ross Andrew Donnachie <[email protected]>
> wrote:
>
> Been working on a hashpipe with a pipeline of network, transposition and then
> disk-dump threads. We have 24 data-buffers that we rotate through.
>
> An inconsistent (happens after various amounts of time) crash occurs with
> this printout:
> -----------------------------------------------------
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_filled): semctl error
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_free_timeout): semop
> error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pktsock_thread): error
> waiting for free databuf [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_set_free): semctl error
> [Invalid argument]
> Tue Dec 15 17:37:19 2020 : Error (hashpipe_databuf_wait_filled_timeout):
> semop error [Invalid argument]
> semop: Invalid argument
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_atasnap_pkt_to_FTP_transpose):
> error waiting for input buffer, rv: -2 [Invalid argument]
> -----------------------------------------------------------
This can happen if data are erroneously written to the header of the data
buffer to pointer arithmetic) and clobber the semaphore ID that is stored
there. One way (but certainly not the only way) this can happen is due to bad
pointer arithmetic. You can check for this "corruption" by running
"hashpipe_check_databuf". It should show something like the following example
(though obviously with values specific to your application):
$ hashpipe_check_databuf -K /root
databuf 1 stats:
data_type='unknown'
header_size=4096
block_size=134422528
n_block=24
shmid=32769
semid=0
semaphore mask: 000000
Specifically, the shmid and semid value shown should match the values displayed
by "ipcs -a".
> Other times an error is caught but no full printout from hashpipe_error() is
> made:
>
> Code calls:
> ++++++++++++++++++++++++++++
> hpguppi_databuf_data(struct hpguppi_input_databuf *d, int block_id) {
> if(block_id < 0 || d->header.n_block < block_id) {
> hashpipe_error(__FUNCTION__,
> "block_id %s out of range [0, %d)",
> block_id, d->header.n_block);
> return NULL;
> ....
> ++++++++++++++++++++++++++++
>
> Printout:
> ============
> Tue Dec 15 17:37:19 2020 : Error (hpguppi_databuf_data)~/src/hpguppi_daq/src:
> ============
>
> Only once have I seen the above printout complete showing that
> d->header.n_block = -23135124... Which indicates some deep rooted rot
> somewhere.
Indeed, it looks like corruption of the data buffer header (which can also be
verified as shown above).
HTH,
Dave
--
You received this message because you are subscribed to the Google Groups
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AA182455-9AB2-4CD9-B4E3-27BF61B21564%40berkeley.edu.