git bisect shows this problem starts surfacing with this commit:

eea929d61e25106adc2598448c865f40e4a6f13b is the first bad commit
commit eea929d61e25106adc2598448c865f40e4a6f13b
Author: Petri Savolainen <[email protected]>
Date:   Thu Nov 10 13:07:39 2016 +0200

    linux-gen: pool: reimplement pool with ring

    Used the ring data structure to implement pool. Also
    buffer structure was simplified to enable future driver
    interface. Every buffer includes a packet header, so each
    buffer can be used as a packet head or segment. Segmentation
    was disabled and segment size was fixed to a large number
    (64kB) to limit the number of modification in the commit.

    Signed-off-by: Petri Savolainen <[email protected]>

I don't think the issue is necessarily with this patch but rather that the
efficiency improvements are probably exposing a latent race condition
elsewhere in ordered queue handling. This needs further investigation.

On Wed, Nov 16, 2016 at 3:01 PM, Bill Fischofer <[email protected]>
wrote:

> Trying to reproduce this I'm seeing sporadic failures in the scheduler
> validation test that don't seem to appear in the base api-next branch.
> Issue seems to be failures in the ordered queue tests:
>
>   Test: scheduler_test_multi_mq_mt_prio_n 
> ...linux.c:273:odpthread_run_start_routine():helper:
> ODP worker thread started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> passed
>   Test: scheduler_test_multi_mq_mt_prio_a 
> ...linux.c:273:odpthread_run_start_routine():helper:
> ODP worker thread started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> passed
>   Test: scheduler_test_multi_mq_mt_prio_o 
> ...linux.c:273:odpthread_run_start_routine():helper:
> ODP worker thread started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
> started as linux pthread. (pid=6274)
> FAILED
>     1. scheduler.c:871  - bctx->sequence == seq
>     2. scheduler.c:871  - bctx->sequence == seq
>   Test: scheduler_test_multi_1q_mt_a_excl 
> ...linux.c:273:odpthread_run_start_routine():helper:
> ODP worker thread started as linux pthread. (pid=6274)
>
> We had seen these earlier but they were never consistently reproducible.
> Petri: are you able to recreate this on your local systems?
>
>
> On Wed, Nov 16, 2016 at 2:03 PM, Maxim Uvarov <[email protected]>
> wrote:
>
>> I can not test patch by patch this series because it fails (one time it
>> was TM, one time kernel died, other time OOM killer killed tests then hang
>> kernel).
>>
>> And for all patches test/common_plat/validation/api/pktio/pktio_main
>> hangs forever:
>>
>>
>> Program received signal SIGINT, Interrupt.
>> 0x00002afbe69ffb80 in __nanosleep_nocancel () at
>> ../sysdeps/unix/syscall-template.S:81
>> 81    in ../sysdeps/unix/syscall-template.S
>> (gdb) bt
>> #0  0x00002afbe69ffb80 in __nanosleep_nocancel () at
>> ../sysdeps/unix/syscall-template.S:81
>> #1  0x0000000000415ced in odp_pktin_recv_tmo (queue=...,
>> packets=packets@entry=0x7ffed64d8bd0, num=num@entry=1,
>>     wait=wait@entry=18446744073709551615) at
>> ../../../platform/linux-generic/odp_packet_io.c:1584
>> #2  0x00000000004047fa in recv_packets_tmo (pktio=pktio@entry=0x2,
>> pkt_tbl=pkt_tbl@entry=0x7ffed64d9500,
>>     seq_tbl=seq_tbl@entry=0x7ffed64d94b0, num=num@entry=1,
>> mode=mode@entry=RECV_TMO, tmo=tmo@entry=18446744073709551615, ns=ns@entry
>> =0)
>>     at ../../../../../../test/common_plat/validation/api/pktio/pkti
>> o.c:515
>> #3  0x00000000004075f8 in test_recv_tmo (mode=RECV_TMO) at
>> ../../../../../../test/common_plat/validation/api/pktio/pktio.c:940
>> #4  0x00002afbe61cc482 in run_single_test () from
>> /usr/local/lib/libcunit.so.1
>> #5  0x00002afbe61cc0b2 in run_single_suite () from
>> /usr/local/lib/libcunit.so.1
>> #6  0x00002afbe61c9d55 in CU_run_all_tests () from
>> /usr/local/lib/libcunit.so.1
>> #7  0x00002afbe61ce245 in basic_run_all_tests () from
>> /usr/local/lib/libcunit.so.1
>> #8  0x00002afbe61cdfe7 in CU_basic_run_tests () from
>> /usr/local/lib/libcunit.so.1
>> #9  0x0000000000409361 in odp_cunit_run () at
>> ../../../../test/common_plat/common/odp_cunit_common.c:298
>> #10 0x00002afbe6c2ff45 in __libc_start_main (main=0x403850 <main>,
>> argc=1, argv=0x7ffed64d9878, init=<optimized out>,
>>     fini=<optimized out>, rtld_fini=<optimized out>,
>> stack_end=0x7ffed64d9868) at libc-start.c:287
>> #11 0x000000000040387e in _start ()
>> (gdb) up
>> #1  0x0000000000415ced in odp_pktin_recv_tmo (queue=...,
>> packets=packets@entry=0x7ffed64d8bd0, num=num@entry=1,
>>     wait=wait@entry=18446744073709551615) at
>> ../../../platform/linux-generic/odp_packet_io.c:1584
>> 1584            nanosleep(&ts, NULL);
>> (gdb) p ts
>> $1 = {tv_sec = 0, tv_nsec = 1000}
>> (gdb) l
>> 1579                }
>> 1580
>> 1581                wait--;
>> 1582            }
>> 1583
>> 1584            nanosleep(&ts, NULL);
>> 1585        }
>> 1586    }
>> 1587
>> 1588    int odp_pktin_recv_mq_tmo(const odp_pktin_queue_t queues[],
>> unsigned num_q,
>> (gdb) up
>> #2  0x00000000004047fa in recv_packets_tmo (pktio=pktio@entry=0x2,
>> pkt_tbl=pkt_tbl@entry=0x7ffed64d9500,
>>     seq_tbl=seq_tbl@entry=0x7ffed64d94b0, num=num@entry=1,
>> mode=mode@entry=RECV_TMO, tmo=tmo@entry=18446744073709551615, ns=ns@entry
>> =0)
>>     at ../../../../../../test/common_plat/validation/api/pktio/pkti
>> o.c:515
>> 515                n = odp_pktin_recv_tmo(pktin[0], pkt_tmp, num - num_rx,
>> (gdb) p num - num_rx
>> $2 = 1
>> (gdb) l
>> 510        /** Multiple odp_pktin_recv_tmo()/odp_pktin_recv_mq_tmo()
>> calls may be
>> 511         *  required to discard possible non-test packets. */
>> 512        do {
>> 513            ts1 = odp_time_global();
>> 514            if (mode == RECV_TMO)
>> 515                n = odp_pktin_recv_tmo(pktin[0], pkt_tmp, num - num_rx,
>> 516                               tmo);
>> 517            else
>> 518                n = odp_pktin_recv_mq_tmo(pktin, (unsigned)num_q,
>> 519                              from, pkt_tmp,
>> (gdb) p tmo
>> $3 = 18446744073709551615
>>
>>
>> I applied patches and following script under root:
>> CLEANUP=0 GIT_URL=/opt/Linaro/odp3.git GIT_BRANCH=api-next ./build.sh
>>
>> Need more investigation into this issue... Not applied yet.
>>
>> Maxim.
>>
>> On 11/16/16 02:58, Bill Fischofer wrote:
>>
>>> Trying again as the repost doesn't seem to show up on the list either.
>>>
>>> For this series:
>>>
>>> Reviewed-and-tested-by: Bill Fischofer <[email protected]
>>> <mailto:[email protected]>>
>>>
>>> On Tue, Nov 15, 2016 at 5:55 PM, Bill Fischofer <
>>> [email protected] <mailto:[email protected]>> wrote:
>>>
>>>     Reposting this since it doesn't seem to have made it to the
>>>     mailing list.
>>>
>>>     For this series:
>>>
>>>     Reviewed-and-tested-by: Bill Fischofer <[email protected]
>>>     <mailto:[email protected]>>
>>>
>>>     On Tue, Nov 15, 2016 at 8:41 AM, Bill Fischofer
>>>     <[email protected] <mailto:[email protected]>>
>>> wrote:
>>>
>>>         For this series:
>>>
>>>         Reviewed-and-tested-by: Bill Fischofer
>>>         <[email protected] <mailto:[email protected]>>
>>>
>>>         On Thu, Nov 10, 2016 at 5:07 AM, Petri Savolainen
>>>         <[email protected]
>>>         <mailto:[email protected]>> wrote:
>>>
>>>             Pool performance is optimized by using a ring as the
>>>             global buffer storage.
>>>             IPC build is disabled, since it needs large modifications
>>>             due to dependency to
>>>             pool internals. Old pool implementation was based on locks
>>>             and linked list of
>>>             buffer headers. New implementation maintain a ring of
>>>             buffer handles, which
>>>             enable fast, burst based allocs and frees. Also ring
>>>             scales better with number
>>>             of cpus than a list (enq and deq operations update
>>>             opposite ends of the pool).
>>>
>>>             L2fwd link rate (%), 2 x 40GE, 64 byte packets
>>>
>>>                     direct-  parallel-               atomic-
>>>             cpus    orig    direct  diff    orig parall  diff    orig
>>>            atomic  diff
>>>             1       7 %     8 %     1 %     6 %  6 %     2 %     5.4
>>>             %   5.6 %   4 %
>>>             2       14 %    15 %    7 %     9 %  9 %     5 %     8 %
>>>             9 %     8 %
>>>             4       28 %    30 %    6 %     13 % 14 %    13 %    12 %
>>>            15 %    19 %
>>>             6       42 %    44 %    6 %     16 % 19 %    19 %    8 %
>>>             20 %    150 %
>>>             8       46 %    59 %    28 %    19 % 23 %    26 %    18 %
>>>            24 %    34 %
>>>             10      55 %    57 %    3 %     20 % 27 %    37 %    8 %
>>>             28 %    264 %
>>>             12      56 %    56 %    -1 %    22 % 31 %    43 %    7 %
>>>             32 %    357 %
>>>
>>>             Max packet rate of NICs are reached with 10-12 cpu on
>>>             direct mode. Otherwise,
>>>             all cases were improved. Especially, scheduler driven
>>>             cases suffered on bad
>>>             pool scalability.
>>>
>>>             changed in v3:
>>>             * rebased
>>>             * ipc disabled with #ifdef
>>>             * added support for multi-segment packets
>>>             * API: added explicit limits for packet length in alloc calls
>>>             * Corrected validation test and example application bugs
>>>             found during
>>>               segmentation implementation
>>>
>>>             changed in v2:
>>>             * rebased to api-next branch
>>>             * added a comment that ring size must be larger than
>>>             number of items in it
>>>             * fixed clang build issue
>>>             * added parens in align macro
>>>
>>>             v1 reviews:
>>>             Reviewed-by: Brian Brooks <[email protected]
>>>             <mailto:[email protected]>>
>>>
>>>
>>>
>>>
>>>             Petri Savolainen (19):
>>>               linux-gen: ipc: disable build of ipc pktio
>>>               linux-gen: pktio: do not free zero packets
>>>               linux-gen: ring: created common ring implementation
>>>               linux-gen: align: added round up power of two
>>>               linux-gen: pool: reimplement pool with ring
>>>               linux-gen: ring: added multi enq and deq
>>>               linux-gen: pool: use ring multi enq and deq operations
>>>               linux-gen: pool: optimize buffer alloc
>>>               linux-gen: pool: clean up pool inlines functions
>>>               linux-gen: pool: ptr instead of hdl in buffer_alloc_multi
>>>               test: validation: buf: test alignment
>>>               test: performance: crypto: use capability to select max
>>>             packet
>>>               test: correctly initialize pool parameters
>>>               test: validation: packet: fix bugs in tailroom and
>>>             concat tests
>>>               linux-gen: packet: added support for segmented packets
>>>               test: validation: packet: improved multi-segment alloc test
>>>               api: packet: added limits for packet len on alloc
>>>               linux-gen: packet: remove zero len support from alloc
>>>               linux-gen: packet: enable multi-segment packets
>>>
>>>              example/generator/odp_generator.c                 |    2 +-
>>>              include/odp/api/spec/packet.h                 |    9 +-
>>>              include/odp/api/spec/pool.h             |    6 +
>>>              platform/linux-generic/Makefile.am <http://le.am>
>>>                  |    1 +
>>>
>>>              .../include/odp/api/plat/packet_types.h           |    6 +-
>>>              .../include/odp/api/plat/pool_types.h             |    6 -
>>>              .../linux-generic/include/odp_align_internal.h    |   34 +-
>>>              .../linux-generic/include/odp_buffer_inlines.h    |  167
>>> +--
>>>              .../linux-generic/include/odp_buffer_internal.h   |  120 +-
>>>              .../include/odp_classification_datamodel.h        |    2 +-
>>>              .../linux-generic/include/odp_config_internal.h   |   55 +-
>>>              .../linux-generic/include/odp_packet_internal.h   |   87 +-
>>>              platform/linux-generic/include/odp_pool_internal.h |  289
>>>             +---
>>>              platform/linux-generic/include/odp_ring_internal.h |  176
>>> +++
>>>              .../linux-generic/include/odp_timer_internal.h    |    4 -
>>>              platform/linux-generic/odp_buffer.c               |   22 +-
>>>              platform/linux-generic/odp_classification.c       |   25 +-
>>>              platform/linux-generic/odp_crypto.c               |   12 +-
>>>              platform/linux-generic/odp_packet.c               |  717
>>>             ++++++++--
>>>              platform/linux-generic/odp_packet_io.c            |    2 +-
>>>              platform/linux-generic/odp_pool.c                 | 1440
>>>             ++++++++------------
>>>              platform/linux-generic/odp_queue.c                |    4 +-
>>>              platform/linux-generic/odp_schedule.c             |  102 +-
>>>              platform/linux-generic/odp_schedule_ordered.c     |    4 +-
>>>              platform/linux-generic/odp_timer.c                |    3 +-
>>>              platform/linux-generic/pktio/dpdk.c               |   10 +-
>>>              platform/linux-generic/pktio/ipc.c                |    3 +-
>>>              platform/linux-generic/pktio/loop.c               |    2 +-
>>>              platform/linux-generic/pktio/netmap.c             |   14 +-
>>>              platform/linux-generic/pktio/socket.c             |   17 +-
>>>              platform/linux-generic/pktio/socket_mmap.c        |   10 +-
>>>              test/common_plat/performance/odp_crypto.c         |   47 +-
>>>              test/common_plat/performance/odp_pktio_perf.c     |    2 +-
>>>              test/common_plat/performance/odp_scheduling.c     |    8 +-
>>>              test/common_plat/validation/api/buffer/buffer.c   |  113 +-
>>>              test/common_plat/validation/api/crypto/crypto.c   |    2 +-
>>>              test/common_plat/validation/api/packet/packet.c   |   96 +-
>>>              test/common_plat/validation/api/pktio/pktio.c     |   21 +-
>>>              38 files changed, 1745 insertions(+), 1895 deletions(-)
>>>              create mode 100644
>>>             platform/linux-generic/include/odp_ring_internal.h
>>>
>>>             --
>>>             2.8.1
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to