Re: [lng-odp] NUMA aware memory allocation?

Stuart Haslam Mon, 11 May 2015 06:05:26 -0700

On Mon, May 11, 2015 at 12:46:15PM +0000, Savolainen, Petri (Nokia - FI/Espoo) 
wrote:
> Hi,
> 
> In general, odp_xxx_param_t should be designed so that memset(&param, 0, 
> sizeof(odp_xxx_param_t)) gives the default behavior. Also if param is a 
> pointer, param == NULL can be defined as the default.


This is currently not the case for odp_queue_param_t. The odp_schedule_*_t
types within that structure are defined by the platform, and linux-generic
currently uses non-zero defaults.

param == NULL is obviously only useful if you want default behaviour for
all of the elements in the structure.

--
Stuart.

> 
> Anyway, special calls for local vs remote configuration should be avoided. I 
> think that a typical ODP application (e.g. all our examples) would consist of 
> a control thread, which would first set up all resources for the worker 
> threads and then create/launch/pin/monitor those threads. So, workers would 
> not necessarily create the resources they use. Also, the control thread 
> itself may not be pinned and may run on any available core (OS kernel 
> decides).
> 
> Direct usage of physical IDs should be minimized in the API. When 
> virtualization is added into the picture, physical node/core/port/etc IDs are 
> not relevant any more. The user decides which physical nodes/cores runs a VM, 
> which threads are pinned to which guest OS cpu IDs, which threads share 
> resources, ...
> 
> ODP application or implementation cannot directly select physical resources, 
> but needs some information from the user to do the "right" thing e.g.'
> - user has configured
>   - guest OS CPUs 3 and 6 to the same NUMA node 1
>   - shared memory area "shm_0" to locate on a DDR connected to node 0
>   - shared memory area "shm_1" to locate on a DDR connected to node 1
>   - "eth1" and "eth2" to be a 10 GE NIC interfaces connected to node 1
> - user launches an app and passes above information to it
> - app main thread
>   - creates two worker threads and pins those to cpu IDs 3 and 6
>   - reserves shared memory from "shm_0" for logs, etc control communication 
> (not local to workers)
>   - reserves shared memory from "shm_1" for worker's shared data (local to 
> workers)
>   - opens pktio interfaces "eth1" and "eth2" (local to workers)
>   - kicks workers to start
> 
> So, some more information may need to flow from user to implementation, but 
> no direct physical IDs from the application. Either we extend the named and 
> preconfigured resources concept from pktio to other (physically located) 
> resources, or add parameters which describe what is needed. Named resources 
> are exact: "send packet outs from eth0" vs "send packets out from an 
> interface nearest to the thread". Similarly e.g. memory may need exact 
> location/properties vs. implementation always selecting the fastest.
> 
> 
> -Petri  
> 
> 
> 
> > -----Original Message-----
> > From: ext Jacob, Jerin [mailto:[email protected]]
> > Sent: Monday, May 11, 2015 12:54 PM
> > To: Bill Fischofer
> > Cc: Gábor Sándor Enyedi; Savolainen, Petri (Nokia - FI/Espoo); Zoltan
> > Kiss; [email protected]
> > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > 
> > 
> > Either way is fine with me. Only concern I have with adding extra info in
> > appropriate odp_xxx_params_t is that NON numa applications(most likely
> > case) needs
> > fill the structure with some default value all the time.
> > 
> > 
> > From: Bill Fischofer <[email protected]>
> > Sent: Friday, May 8, 2015 11:56 PM
> > To: Jacob, Jerin
> > Cc: Gábor Sándor Enyedi; Savolainen, Petri (Nokia - FI/Espoo); Zoltan
> > Kiss; [email protected]
> > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > 
> > 
> > Good points, however rather than having odp_..._onnode() variants, I think
> > encoding the extra info in an appropriate odp_xxx_params_t structure would
> > be more consistent with how we've been shaping the APIs.  That way it
> > doesn't require separate  API calls to handle the variants.
> > 
> > 
> > On Fri, May 8, 2015 at 10:11 AM, Jacob, Jerin
> > <[email protected]> wrote:
> > 
> > In multi node ODP implementation / application usage perceptive,
> > we need to consider, How we can expose the HW resources in each node.
> > resources could be cpus, memory and any hw accelerated blocks for packet
> > processing.
> > 
> > 
> > In case of CPU resource, we could take the current API model like, API's
> > for querying how may
> > cpu resource available in each node and start specific work on selected
> > cpus using odp_cpu_mask_t/
> > Let implementation take care of pinning/exposing the number cores for ODP
> > on each node.
> > 
> > In case of memory resource, IMO odp_shm_reserve can extended to allocated
> > form a
> > specific node
> > 
> > In case of hw accelerated blocks resources, IMO we should add node
> > parameter while creating the handles
> > 
> > 
> > IMO, Gábor Sándor Enyedi's example may be visualized like this on multi
> > node ODP
> > 
> > 
> > -local_pool = odp_pool_create() // create a local pool
> > -odp_pktio_open(..,local_pool)  // open local node pktio and attach to
> > local pool
> > 
> > -remote_pool = odp_pool_create_onnode(node...) // create a remote pool as
> > packet needs to go remote node DDR
> > -odp_pktio_open_onnode(node,...,remote_pool) // open remote node pktio
> > with remote pool
> > 
> > -odp_cpu_count()
> > -create cpu mask and lunch work on local node
> > 
> > -odp_cpu_count(node) // to get number works available on remote node
> > -create cpu mask and lunch work on remote node
> > 
> > 
> > From: Bill Fischofer <[email protected]>
> > Sent: Friday, May 8, 2015 7:43 PM
> > To: Gábor Sándor Enyedi
> > Cc: Savolainen, Petri (Nokia - FI/Espoo); Jacob, Jerin; Zoltan Kiss;  lng-
> > [email protected]
> > 
> > 
> > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > 
> > 
> > Thanks, that's good info. So in this case is it sufficient to say that the
> > memory used for odp_pool_create() is the one associated with the thread
> > that executes the create call?  Presumably then when a packet arrives and
> > is assigned to a CoS  that points to  that pool then events from that pool
> > are sent to queues that are only scheduled to the corresponding cores that
> > have fast access to that pool.  Right now queues have an
> > odp_schedule_group_t but that's still fairly rudimentary.  It sounds like
> > we might want  to point the queue at the pool for scheduling purposes so
> > that it would inherit the NUMA considerations you mention.
> > 
> > 
> > On Fri, May 8, 2015 at 9:00 AM, Gábor Sándor Enyedi
> > <[email protected]> wrote:
> > 
> > For me and for now the use-case is very simple: we have an x86 with two
> > Xeon CPU-s (dual socket) in it. Each of the CPU-s have its own memory and
> > own PCIExpress bus, as usual. First, I want to make only some test code,
> > but later we may  want to port our high  speed OF soft switch to ODP (now,
> > its on DPDK). We want to assign a correct core for each interface, and
> > each slot must use its own copy of forwarding data in its own memory. We
> > have the experience that if we accidentally assigned a bad  core to an
> > interface,  we could get even about 50% performance drop, so NUMA is
> > essential.
> > Based on the previous, for us something similar to that used in DPDK's
> > rte_malloc (and its variants) and a NUMA aware buffer pool create was
> > enough for now. Later we want to investigate other architectures... but I
> > don't know the use-cases yet.
> > 
> > Gabor
> > 
> > 
> > 
> > 
> > 
> > 
> > On 05/08/2015 03:35 PM, Bill Fischofer wrote:
> > 
> > Insofar as possible, the mechanics of NUMA should be the responsibility of
> > the ODP implementation, rather than the application, since that way the
> > application retains maximum portability.
> > 
> > 
> > However, from an ODP API perspective, I think we need to be mindful of
> > NUMA considerations to give implementations the necessary "hooks" to
> > properly support the NUMA aspects of their platform.  This is why ODP APIs
> > need to be careful about what addressability   assumptions they make.
> > 
> > 
> > If Gábor or Jerrin can list a couple of specific relevant cases I think
> > that will help in focusing the discussion and get us off to a good start.
> > 
> > 
> > On Fri, May 8, 2015 at 8:26 AM, Savolainen, Petri (Nokia - FI/Espoo)
> > <[email protected]> wrote:
> >  Hi,
> > 
> > ODP is OS agnostic and thus thread management (e.g. thread creation and
> > pinning to physical cores) and NUMA awareness should happen mostly outside
> > of ODP APIs.
> > 
> > For example, NUMA could be visible in ODP APIs this way:
> > * Add odp_cpumask_xxx() calls that indicate NUMA dependency between CPUs
> > (just for information)
> > * Add a way to identify groups of threads which frequently share resources
> > (memory and handles) within the group
> > * Give the thread group as a hint (parameter) to various ODP calls that
> > create shared resources. Implementation can use the information to
> > allocate resources "near" to the threads in the group. However, the user
> > is responsible to group the threads and map/pin   those into physical CPUs
> > in a way that enables NUMA aware optimizations.
> > 
> > 
> > -Petri
> > 
> > 
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: lng-odp [mailto:[email protected]] On Behalf Of ext
> > > Gábor Sándor Enyedi
> > > Sent: Friday, May 08, 2015 10:48 AM
> > > To: Jerin Jacob; Zoltan Kiss
> > > Cc: [email protected]
> > > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > >
> > > Hi,
> > >
> > > Thanks. So, is the workaround for now to start the threads, and do all
> > > the memory reservation on the thread? And to call odp_shm_reserve()
> > > instead of simple malloc() calls? Can I use multiple buffer pools, one
> > > for each thread or interface?
> > > BR,
> > >
> > > Gabor
> > >
> > > P.s.: Do you know when will this issue in the API be fixed (e.g. in next
> > > release or whatever)?
> > >
> > > On 05/08/2015 09:06 AM, Jerin Jacob wrote:
> > > > On Thu, May 07, 2015 at 05:00:54PM +0100, Zoltan Kiss wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I'm not aware of any such interface, but others with more knowledge
> > can
> > > >> comment about it. The ODP-DPDK implementation creates buffer pools on
> > > the
> > > >> NUMA node where the pool create function were actually called.
> > > > current ODP spec is not NUMA aware. We need to have API to support
> > nodes
> > > enumeration and
> > > > explicit node parameter to alloc/free resource from specific node like
> > > odp_shm_reserve_onnode(node, ...)
> > > > and while keeping existing API odp_shm_reserve() allocated on node
> > where
> > > the current code runs
> > > >
> > > >
> > > >> Regards,
> > > >>
> > > >> Zoli
> > > >>
> > > >> On 07/05/15 16:32, Gábor Sándor Enyedi wrote:
> > > >>> Hi!
> > > >>>
> > > >>> I just started to test ODP, trying to write my first application,
> > but
> > > >>> found a problem: if I want to write NUMA aware code, how should I
> > > >>> allocate memory close to a given thread? I mean, I know there is
> > > >>> libnuma, but should I use it? I guess not, but I cannot find memory
> > > >>> allocation functions in ODP. Is there a function similar to
> > > >>> numa_alloc_onnode()?
> > > >>> Thanks,
> > > >>>
> > > >>> Gabor
> > > >>> _______________________________________________
> > > >>> lng-odp mailing list
> > > >>> [email protected]
> > > >>>   https://lists.linaro.org/mailman/listinfo/lng-odp
> > > >> _______________________________________________
> > > >> lng-odp mailing list
> > > >> [email protected]
> > > >>   https://lists.linaro.org/mailman/listinfo/lng-odp
> > >
> > >
> > > _______________________________________________
> > > lng-odp mailing list
> > > [email protected]
> > > https://lists.linaro.org/mailman/listinfo/lng-odp
> > _______________________________________________
> > lng-odp mailing list
> > [email protected]
> > https://lists.linaro.org/mailman/listinfo/lng-odp
> > 
> > 
> > 
> > 
> > 
> _______________________________________________
> lng-odp mailing list
> [email protected]
> https://lists.linaro.org/mailman/listinfo/lng-odp
_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] NUMA aware memory allocation?

Reply via email to