Trying to reproduce this I'm seeing sporadic failures in the scheduler
validation test that don't seem to appear in the base api-next branch.
Issue seems to be failures in the ordered queue tests:
Test: scheduler_test_multi_mq_mt_prio_n
...linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
started as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
passed
Test: scheduler_test_multi_mq_mt_prio_a
...linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
started as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
passed
Test: scheduler_test_multi_mq_mt_prio_o
...linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
started as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
linux.c:273:odpthread_run_start_routine():helper: ODP worker thread started
as linux pthread. (pid=6274)
FAILED
1. scheduler.c:871 - bctx->sequence == seq
2. scheduler.c:871 - bctx->sequence == seq
Test: scheduler_test_multi_1q_mt_a_excl
...linux.c:273:odpthread_run_start_routine():helper: ODP worker thread
started as linux pthread. (pid=6274)
We had seen these earlier but they were never consistently reproducible.
Petri: are you able to recreate this on your local systems?
On Wed, Nov 16, 2016 at 2:03 PM, Maxim Uvarov <[email protected]>
wrote:
> I can not test patch by patch this series because it fails (one time it
> was TM, one time kernel died, other time OOM killer killed tests then hang
> kernel).
>
> And for all patches test/common_plat/validation/api/pktio/pktio_main
> hangs forever:
>
>
> Program received signal SIGINT, Interrupt.
> 0x00002afbe69ffb80 in __nanosleep_nocancel () at
> ../sysdeps/unix/syscall-template.S:81
> 81 in ../sysdeps/unix/syscall-template.S
> (gdb) bt
> #0 0x00002afbe69ffb80 in __nanosleep_nocancel () at
> ../sysdeps/unix/syscall-template.S:81
> #1 0x0000000000415ced in odp_pktin_recv_tmo (queue=...,
> packets=packets@entry=0x7ffed64d8bd0, num=num@entry=1,
> wait=wait@entry=18446744073709551615) at
> ../../../platform/linux-generic/odp_packet_io.c:1584
> #2 0x00000000004047fa in recv_packets_tmo (pktio=pktio@entry=0x2,
> pkt_tbl=pkt_tbl@entry=0x7ffed64d9500,
> seq_tbl=seq_tbl@entry=0x7ffed64d94b0, num=num@entry=1,
> mode=mode@entry=RECV_TMO,
> tmo=tmo@entry=18446744073709551615, ns=ns@entry=0)
> at ../../../../../../test/common_plat/validation/api/pktio/pktio.c:515
> #3 0x00000000004075f8 in test_recv_tmo (mode=RECV_TMO) at
> ../../../../../../test/common_plat/validation/api/pktio/pktio.c:940
> #4 0x00002afbe61cc482 in run_single_test () from
> /usr/local/lib/libcunit.so.1
> #5 0x00002afbe61cc0b2 in run_single_suite () from
> /usr/local/lib/libcunit.so.1
> #6 0x00002afbe61c9d55 in CU_run_all_tests () from
> /usr/local/lib/libcunit.so.1
> #7 0x00002afbe61ce245 in basic_run_all_tests () from
> /usr/local/lib/libcunit.so.1
> #8 0x00002afbe61cdfe7 in CU_basic_run_tests () from
> /usr/local/lib/libcunit.so.1
> #9 0x0000000000409361 in odp_cunit_run () at
> ../../../../test/common_plat/common/odp_cunit_common.c:298
> #10 0x00002afbe6c2ff45 in __libc_start_main (main=0x403850 <main>, argc=1,
> argv=0x7ffed64d9878, init=<optimized out>,
> fini=<optimized out>, rtld_fini=<optimized out>,
> stack_end=0x7ffed64d9868) at libc-start.c:287
> #11 0x000000000040387e in _start ()
> (gdb) up
> #1 0x0000000000415ced in odp_pktin_recv_tmo (queue=...,
> packets=packets@entry=0x7ffed64d8bd0, num=num@entry=1,
> wait=wait@entry=18446744073709551615) at
> ../../../platform/linux-generic/odp_packet_io.c:1584
> 1584 nanosleep(&ts, NULL);
> (gdb) p ts
> $1 = {tv_sec = 0, tv_nsec = 1000}
> (gdb) l
> 1579 }
> 1580
> 1581 wait--;
> 1582 }
> 1583
> 1584 nanosleep(&ts, NULL);
> 1585 }
> 1586 }
> 1587
> 1588 int odp_pktin_recv_mq_tmo(const odp_pktin_queue_t queues[],
> unsigned num_q,
> (gdb) up
> #2 0x00000000004047fa in recv_packets_tmo (pktio=pktio@entry=0x2,
> pkt_tbl=pkt_tbl@entry=0x7ffed64d9500,
> seq_tbl=seq_tbl@entry=0x7ffed64d94b0, num=num@entry=1,
> mode=mode@entry=RECV_TMO,
> tmo=tmo@entry=18446744073709551615, ns=ns@entry=0)
> at ../../../../../../test/common_plat/validation/api/pktio/pktio.c:515
> 515 n = odp_pktin_recv_tmo(pktin[0], pkt_tmp, num - num_rx,
> (gdb) p num - num_rx
> $2 = 1
> (gdb) l
> 510 /** Multiple odp_pktin_recv_tmo()/odp_pktin_recv_mq_tmo()
> calls may be
> 511 * required to discard possible non-test packets. */
> 512 do {
> 513 ts1 = odp_time_global();
> 514 if (mode == RECV_TMO)
> 515 n = odp_pktin_recv_tmo(pktin[0], pkt_tmp, num - num_rx,
> 516 tmo);
> 517 else
> 518 n = odp_pktin_recv_mq_tmo(pktin, (unsigned)num_q,
> 519 from, pkt_tmp,
> (gdb) p tmo
> $3 = 18446744073709551615
>
>
> I applied patches and following script under root:
> CLEANUP=0 GIT_URL=/opt/Linaro/odp3.git GIT_BRANCH=api-next ./build.sh
>
> Need more investigation into this issue... Not applied yet.
>
> Maxim.
>
> On 11/16/16 02:58, Bill Fischofer wrote:
>
>> Trying again as the repost doesn't seem to show up on the list either.
>>
>> For this series:
>>
>> Reviewed-and-tested-by: Bill Fischofer <[email protected]
>> <mailto:[email protected]>>
>>
>> On Tue, Nov 15, 2016 at 5:55 PM, Bill Fischofer <
>> [email protected] <mailto:[email protected]>> wrote:
>>
>> Reposting this since it doesn't seem to have made it to the
>> mailing list.
>>
>> For this series:
>>
>> Reviewed-and-tested-by: Bill Fischofer <[email protected]
>> <mailto:[email protected]>>
>>
>> On Tue, Nov 15, 2016 at 8:41 AM, Bill Fischofer
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> For this series:
>>
>> Reviewed-and-tested-by: Bill Fischofer
>> <[email protected] <mailto:[email protected]>>
>>
>> On Thu, Nov 10, 2016 at 5:07 AM, Petri Savolainen
>> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Pool performance is optimized by using a ring as the
>> global buffer storage.
>> IPC build is disabled, since it needs large modifications
>> due to dependency to
>> pool internals. Old pool implementation was based on locks
>> and linked list of
>> buffer headers. New implementation maintain a ring of
>> buffer handles, which
>> enable fast, burst based allocs and frees. Also ring
>> scales better with number
>> of cpus than a list (enq and deq operations update
>> opposite ends of the pool).
>>
>> L2fwd link rate (%), 2 x 40GE, 64 byte packets
>>
>> direct- parallel- atomic-
>> cpus orig direct diff orig parall diff orig
>> atomic diff
>> 1 7 % 8 % 1 % 6 % 6 % 2 % 5.4
>> % 5.6 % 4 %
>> 2 14 % 15 % 7 % 9 % 9 % 5 % 8 %
>> 9 % 8 %
>> 4 28 % 30 % 6 % 13 % 14 % 13 % 12 %
>> 15 % 19 %
>> 6 42 % 44 % 6 % 16 % 19 % 19 % 8 %
>> 20 % 150 %
>> 8 46 % 59 % 28 % 19 % 23 % 26 % 18 %
>> 24 % 34 %
>> 10 55 % 57 % 3 % 20 % 27 % 37 % 8 %
>> 28 % 264 %
>> 12 56 % 56 % -1 % 22 % 31 % 43 % 7 %
>> 32 % 357 %
>>
>> Max packet rate of NICs are reached with 10-12 cpu on
>> direct mode. Otherwise,
>> all cases were improved. Especially, scheduler driven
>> cases suffered on bad
>> pool scalability.
>>
>> changed in v3:
>> * rebased
>> * ipc disabled with #ifdef
>> * added support for multi-segment packets
>> * API: added explicit limits for packet length in alloc calls
>> * Corrected validation test and example application bugs
>> found during
>> segmentation implementation
>>
>> changed in v2:
>> * rebased to api-next branch
>> * added a comment that ring size must be larger than
>> number of items in it
>> * fixed clang build issue
>> * added parens in align macro
>>
>> v1 reviews:
>> Reviewed-by: Brian Brooks <[email protected]
>> <mailto:[email protected]>>
>>
>>
>>
>>
>> Petri Savolainen (19):
>> linux-gen: ipc: disable build of ipc pktio
>> linux-gen: pktio: do not free zero packets
>> linux-gen: ring: created common ring implementation
>> linux-gen: align: added round up power of two
>> linux-gen: pool: reimplement pool with ring
>> linux-gen: ring: added multi enq and deq
>> linux-gen: pool: use ring multi enq and deq operations
>> linux-gen: pool: optimize buffer alloc
>> linux-gen: pool: clean up pool inlines functions
>> linux-gen: pool: ptr instead of hdl in buffer_alloc_multi
>> test: validation: buf: test alignment
>> test: performance: crypto: use capability to select max
>> packet
>> test: correctly initialize pool parameters
>> test: validation: packet: fix bugs in tailroom and
>> concat tests
>> linux-gen: packet: added support for segmented packets
>> test: validation: packet: improved multi-segment alloc test
>> api: packet: added limits for packet len on alloc
>> linux-gen: packet: remove zero len support from alloc
>> linux-gen: packet: enable multi-segment packets
>>
>> example/generator/odp_generator.c | 2 +-
>> include/odp/api/spec/packet.h | 9 +-
>> include/odp/api/spec/pool.h | 6 +
>> platform/linux-generic/Makefile.am <http://le.am>
>> | 1 +
>>
>> .../include/odp/api/plat/packet_types.h | 6 +-
>> .../include/odp/api/plat/pool_types.h | 6 -
>> .../linux-generic/include/odp_align_internal.h | 34 +-
>> .../linux-generic/include/odp_buffer_inlines.h | 167 +--
>> .../linux-generic/include/odp_buffer_internal.h | 120 +-
>> .../include/odp_classification_datamodel.h | 2 +-
>> .../linux-generic/include/odp_config_internal.h | 55 +-
>> .../linux-generic/include/odp_packet_internal.h | 87 +-
>> platform/linux-generic/include/odp_pool_internal.h | 289
>> +---
>> platform/linux-generic/include/odp_ring_internal.h | 176
>> +++
>> .../linux-generic/include/odp_timer_internal.h | 4 -
>> platform/linux-generic/odp_buffer.c | 22 +-
>> platform/linux-generic/odp_classification.c | 25 +-
>> platform/linux-generic/odp_crypto.c | 12 +-
>> platform/linux-generic/odp_packet.c | 717
>> ++++++++--
>> platform/linux-generic/odp_packet_io.c | 2 +-
>> platform/linux-generic/odp_pool.c | 1440
>> ++++++++------------
>> platform/linux-generic/odp_queue.c | 4 +-
>> platform/linux-generic/odp_schedule.c | 102 +-
>> platform/linux-generic/odp_schedule_ordered.c | 4 +-
>> platform/linux-generic/odp_timer.c | 3 +-
>> platform/linux-generic/pktio/dpdk.c | 10 +-
>> platform/linux-generic/pktio/ipc.c | 3 +-
>> platform/linux-generic/pktio/loop.c | 2 +-
>> platform/linux-generic/pktio/netmap.c | 14 +-
>> platform/linux-generic/pktio/socket.c | 17 +-
>> platform/linux-generic/pktio/socket_mmap.c | 10 +-
>> test/common_plat/performance/odp_crypto.c | 47 +-
>> test/common_plat/performance/odp_pktio_perf.c | 2 +-
>> test/common_plat/performance/odp_scheduling.c | 8 +-
>> test/common_plat/validation/api/buffer/buffer.c | 113 +-
>> test/common_plat/validation/api/crypto/crypto.c | 2 +-
>> test/common_plat/validation/api/packet/packet.c | 96 +-
>> test/common_plat/validation/api/pktio/pktio.c | 21 +-
>> 38 files changed, 1745 insertions(+), 1895 deletions(-)
>> create mode 100644
>> platform/linux-generic/include/odp_ring_internal.h
>>
>> --
>> 2.8.1
>>
>>
>>
>>
>>
>