From: EXT Bill Fischofer [mailto:[email protected]]
Sent: Monday, October 19, 2015 6:23 PM
To: Ola Liljedahl
Cc: Savolainen, Petri (Nokia - FI/Espoo); LNG ODP Mailman List
Subject: Re: [lng-odp] Bug 1851 - odp_pool_destroy() failure


On Mon, Oct 19, 2015 at 10:07 AM, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
On 19 October 2015 at 16:43, Bill Fischofer 
<[email protected]<mailto:[email protected]>> wrote:
We could do that in linux-generic, which has a fairly small number of threads 
supported.  I'd be concerned about how that would scale to systems that can 
support many more threads, especially when NUMA considerations come into play.  
Is it simply unacceptable to have some sort of "finished" API call?  That would 
seem so solve the problem in a clean and scalable manner.

Memory barrier is different from execution barrier. Alloc/free/destroy should 
scale well also when thread local stash is stored in shared memory, no matter 
how many threads there are or with multi-chip interconnect. Alloc and free work 
in per thread slice of the stash (== no synchronization needed), only destroy 
needs to go an read the entire stash after application threads have stopped 
using it (== no synchronization needed).

Finish call would complicate the API, user would need to
1) ensure that application does not use the pool any more (which can be done in 
many ways)
2) schedule the finish() call to be executed on all threads
3) wait and synchronize to notice when all thread have called finish() (what if 
one of the threads did exit before calling it)
4) call destroy on one thread

Currently we have only steps 1 and 4.

Isn't this conceptually similar to the stop scheduling call so that I can drain 
the prescheduling queue and then stop participating in event processing? In 
order to allow for "non-ideal" implementations (because instant sharing of all 
resources isn't always very performant), we create API's that tell ODP that 
this thread wishes to withdraw from processing using shared resources.

It is not. Schedule_pause() tells that this thread is now stepping out from the 
scheduling loop, so that some prescheduled events can be processed and those 
flows will not deadlock if the thread would not return. Only application 
process the stash of pre-scheduled events. Implementation can move 
pre-allocated buffers back to global pool in thread termination, without 
applications help.


I think that's a useful analogy.  We've recently added stop/start APIs to pktio 
for similar reasons, and of course we have odp_schedule_pause() that serves the 
same advisory function.  We don't need a "start" API for pools (though if you 
wanted one for symmetry I don't see any harm there) but you really do want a 
"stop" API.

Pktio start/stop are also different. Those controls event input from an 
external source (the network), not potential per thread stashing. There’s no 
requirement for each thread to call pktio stop when application wants to stop 
incoming packets from the network.

-Petri




On Mon, Oct 19, 2015 at 9:15 AM, Savolainen, Petri (Nokia - FI/Espoo) 
<[email protected]<mailto:[email protected]>> wrote:
A SW implementation can place the per thread stash into shared memory where the 
thread calling destroy() can see stashes of all  other threads. Since  
application must synchronize the destroy call (to happen only after all free() 
calls have returned), implementation must just ensure that the destroy call 
reads fresh stash status data (== it has correct memory read/write barriers in 
place). Performance should be still good – it’s matter of moving the per thread 
stash from TLS to shared memory (no additional synchronization per alloc/free).
-Petri


From: EXT Bill Fischofer 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, October 19, 2015 2:26 PM
To: Savolainen, Petri (Nokia - FI/Espoo)
Cc: LNG ODP Mailman List
Subject: Re: [lng-odp] Bug 1851 - odp_pool_destroy() failure

This is an important discussion, especially as we look to high-performance SW 
implementations of ODP. Obviously we can stipulate any functional behavior we 
want. The question is how much overhead is acceptable to achieve such 
stipulated functionality? One of the reasons DPDK does not support mempool 
destroys is this issue of distributed cache management. If we don't want the 
application to take any responsibility in this area, then the implementation 
needs to impose additional bookkeeping overhead that will likely impact the 
performance of normal operation.

What's needed is some sort of indication that a thread is not just freeing a 
buffer, but is done with operations on a pool. One way of doing this is to add 
an odp_pool_finished() API that tells the implementation that this thread is 
done with the pool (e.g., asserts that no further alloc() calls will be made by 
this thread on it).  My suggestion in the response to the bug was that 
odp_pool_destory() can serve this purpose, however I'd have no problem with 
adding another API that serves the same notification purpose.

Without such an API, it's not clear how we can achieve the desired 
functionality without a lot of additional overhead or removing any sort of 
safety checks.  If the latter is acceptable, we could say that 
odp_pool_destroy() always succeeds and if the application had any outstanding 
buffers or tries to use the pool handle following a destroy() call then the 
result is undefined.



On Mon, Oct 19, 2015 at 5:48 AM, Savolainen, Petri (Nokia - FI/Espoo) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Linux-generic pool implementation has a bug ( 
https://bugs.linaro.org/show_bug.cgi?id=1851 ) that prevents dynamic pool 
destroy. From API point of view, any resource (e.g. pool) is created once ( 
xxx_create call returns a handle) and destroyed once (pass the handle to 
xxx_destroy). Any thread can create a resource and any thread can destroy it. 
Application threads  must synchronize resource usage and destroy call, but not 
implementation specifics like potential usage of per thread stashes or flush of 
those.

For example, this valid usage of the pool API:

Thread 1            Thread 2              Thread 3
--------------------------------------------------

init_global()
init_local()        init_local()          init_local()

                    pool = pool_create()

barrier()           barrier()             barrier()
buf = alloc(pool)   buf = alloc(pool)     buf = alloc(pool)
free(buf)           free(buf)             free(buf)
barrier()           barrier()             barrier()

pool_destroy(pool)

barrier()           barrier()             barrier()
do_something()      do_something()        do_something()
term_local()        term_local()          term_local()
                                          term_global()


So, e.g. pool_destroy must succeed when all buffers have been freed before the 
call - no matter:
* which thread calls it
* has the calling thread itself called alloc or free
* have other threads called already term_local


-Petri


_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
https://lists.linaro.org/mailman/listinfo/lng-odp



_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
https://lists.linaro.org/mailman/listinfo/lng-odp


_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to