Re: [lng-odp] NUMA aware memory allocation?

Bill Fischofer Mon, 11 May 2015 08:07:50 -0700

All the more reasons for formalizing this requirement via
odp_xxx_param_init() APIs.


On Mon, May 11, 2015 at 8:03 AM, Stuart Haslam <[email protected]>
wrote:

> On Mon, May 11, 2015 at 12:46:15PM +0000, Savolainen, Petri (Nokia -
> FI/Espoo) wrote:
> > Hi,
> >
> > In general, odp_xxx_param_t should be designed so that memset(&param, 0,
> sizeof(odp_xxx_param_t)) gives the default behavior. Also if param is a
> pointer, param == NULL can be defined as the default.
>
> This is currently not the case for odp_queue_param_t. The odp_schedule_*_t
> types within that structure are defined by the platform, and linux-generic
> currently uses non-zero defaults.
>
> param == NULL is obviously only useful if you want default behaviour for
> all of the elements in the structure.
>
> --
> Stuart.
>
> >
> > Anyway, special calls for local vs remote configuration should be
> avoided. I think that a typical ODP application (e.g. all our examples)
> would consist of a control thread, which would first set up all resources
> for the worker threads and then create/launch/pin/monitor those threads.
> So, workers would not necessarily create the resources they use. Also, the
> control thread itself may not be pinned and may run on any available core
> (OS kernel decides).
> >
> > Direct usage of physical IDs should be minimized in the API. When
> virtualization is added into the picture, physical node/core/port/etc IDs
> are not relevant any more. The user decides which physical nodes/cores runs
> a VM, which threads are pinned to which guest OS cpu IDs, which threads
> share resources, ...
> >
> > ODP application or implementation cannot directly select physical
> resources, but needs some information from the user to do the "right" thing
> e.g.'
> > - user has configured
> >   - guest OS CPUs 3 and 6 to the same NUMA node 1
> >   - shared memory area "shm_0" to locate on a DDR connected to node 0
> >   - shared memory area "shm_1" to locate on a DDR connected to node 1
> >   - "eth1" and "eth2" to be a 10 GE NIC interfaces connected to node 1
> > - user launches an app and passes above information to it
> > - app main thread
> >   - creates two worker threads and pins those to cpu IDs 3 and 6
> >   - reserves shared memory from "shm_0" for logs, etc control
> communication (not local to workers)
> >   - reserves shared memory from "shm_1" for worker's shared data (local
> to workers)
> >   - opens pktio interfaces "eth1" and "eth2" (local to workers)
> >   - kicks workers to start
> >
> > So, some more information may need to flow from user to implementation,
> but no direct physical IDs from the application. Either we extend the named
> and preconfigured resources concept from pktio to other (physically
> located) resources, or add parameters which describe what is needed. Named
> resources are exact: "send packet outs from eth0" vs "send packets out from
> an interface nearest to the thread". Similarly e.g. memory may need exact
> location/properties vs. implementation always selecting the fastest.
> >
> >
> > -Petri
> >
> >
> >
> > > -----Original Message-----
> > > From: ext Jacob, Jerin [mailto:[email protected]]
> > > Sent: Monday, May 11, 2015 12:54 PM
> > > To: Bill Fischofer
> > > Cc: Gábor Sándor Enyedi; Savolainen, Petri (Nokia - FI/Espoo); Zoltan
> > > Kiss; [email protected]
> > > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > >
> > >
> > > Either way is fine with me. Only concern I have with adding extra info
> in
> > > appropriate odp_xxx_params_t is that NON numa applications(most likely
> > > case) needs
> > > fill the structure with some default value all the time.
> > >
> > >
> > > From: Bill Fischofer <[email protected]>
> > > Sent: Friday, May 8, 2015 11:56 PM
> > > To: Jacob, Jerin
> > > Cc: Gábor Sándor Enyedi; Savolainen, Petri (Nokia - FI/Espoo); Zoltan
> > > Kiss; [email protected]
> > > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > >
> > >
> > > Good points, however rather than having odp_..._onnode() variants, I
> think
> > > encoding the extra info in an appropriate odp_xxx_params_t structure
> would
> > > be more consistent with how we've been shaping the APIs.  That way it
> > > doesn't require separate  API calls to handle the variants.
> > >
> > >
> > > On Fri, May 8, 2015 at 10:11 AM, Jacob, Jerin
> > > <[email protected]> wrote:
> > >
> > > In multi node ODP implementation / application usage perceptive,
> > > we need to consider, How we can expose the HW resources in each node.
> > > resources could be cpus, memory and any hw accelerated blocks for
> packet
> > > processing.
> > >
> > >
> > > In case of CPU resource, we could take the current API model like,
> API's
> > > for querying how may
> > > cpu resource available in each node and start specific work on selected
> > > cpus using odp_cpu_mask_t/
> > > Let implementation take care of pinning/exposing the number cores for
> ODP
> > > on each node.
> > >
> > > In case of memory resource, IMO odp_shm_reserve can extended to
> allocated
> > > form a
> > > specific node
> > >
> > > In case of hw accelerated blocks resources, IMO we should add node
> > > parameter while creating the handles
> > >
> > >
> > > IMO, Gábor Sándor Enyedi's example may be visualized like this on multi
> > > node ODP
> > >
> > >
> > > -local_pool = odp_pool_create() // create a local pool
> > > -odp_pktio_open(..,local_pool)  // open local node pktio and attach to
> > > local pool
> > >
> > > -remote_pool = odp_pool_create_onnode(node...) // create a remote pool
> as
> > > packet needs to go remote node DDR
> > > -odp_pktio_open_onnode(node,...,remote_pool) // open remote node pktio
> > > with remote pool
> > >
> > > -odp_cpu_count()
> > > -create cpu mask and lunch work on local node
> > >
> > > -odp_cpu_count(node) // to get number works available on remote node
> > > -create cpu mask and lunch work on remote node
> > >
> > >
> > > From: Bill Fischofer <[email protected]>
> > > Sent: Friday, May 8, 2015 7:43 PM
> > > To: Gábor Sándor Enyedi
> > > Cc: Savolainen, Petri (Nokia - FI/Espoo); Jacob, Jerin; Zoltan Kiss;
> lng-
> > > [email protected]
> > >
> > >
> > > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > >
> > >
> > > Thanks, that's good info. So in this case is it sufficient to say that
> the
> > > memory used for odp_pool_create() is the one associated with the thread
> > > that executes the create call?  Presumably then when a packet arrives
> and
> > > is assigned to a CoS  that points to  that pool then events from that
> pool
> > > are sent to queues that are only scheduled to the corresponding cores
> that
> > > have fast access to that pool.  Right now queues have an
> > > odp_schedule_group_t but that's still fairly rudimentary.  It sounds
> like
> > > we might want  to point the queue at the pool for scheduling purposes
> so
> > > that it would inherit the NUMA considerations you mention.
> > >
> > >
> > > On Fri, May 8, 2015 at 9:00 AM, Gábor Sándor Enyedi
> > > <[email protected]> wrote:
> > >
> > > For me and for now the use-case is very simple: we have an x86 with two
> > > Xeon CPU-s (dual socket) in it. Each of the CPU-s have its own memory
> and
> > > own PCIExpress bus, as usual. First, I want to make only some test
> code,
> > > but later we may  want to port our high  speed OF soft switch to ODP
> (now,
> > > its on DPDK). We want to assign a correct core for each interface, and
> > > each slot must use its own copy of forwarding data in its own memory.
> We
> > > have the experience that if we accidentally assigned a bad  core to an
> > > interface,  we could get even about 50% performance drop, so NUMA is
> > > essential.
> > > Based on the previous, for us something similar to that used in DPDK's
> > > rte_malloc (and its variants) and a NUMA aware buffer pool create was
> > > enough for now. Later we want to investigate other architectures...
> but I
> > > don't know the use-cases yet.
> > >
> > > Gabor
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 05/08/2015 03:35 PM, Bill Fischofer wrote:
> > >
> > > Insofar as possible, the mechanics of NUMA should be the
> responsibility of
> > > the ODP implementation, rather than the application, since that way the
> > > application retains maximum portability.
> > >
> > >
> > > However, from an ODP API perspective, I think we need to be mindful of
> > > NUMA considerations to give implementations the necessary "hooks" to
> > > properly support the NUMA aspects of their platform.  This is why ODP
> APIs
> > > need to be careful about what addressability   assumptions they make.
> > >
> > >
> > > If Gábor or Jerrin can list a couple of specific relevant cases I think
> > > that will help in focusing the discussion and get us off to a good
> start.
> > >
> > >
> > > On Fri, May 8, 2015 at 8:26 AM, Savolainen, Petri (Nokia - FI/Espoo)
> > > <[email protected]> wrote:
> > >  Hi,
> > >
> > > ODP is OS agnostic and thus thread management (e.g. thread creation and
> > > pinning to physical cores) and NUMA awareness should happen mostly
> outside
> > > of ODP APIs.
> > >
> > > For example, NUMA could be visible in ODP APIs this way:
> > > * Add odp_cpumask_xxx() calls that indicate NUMA dependency between
> CPUs
> > > (just for information)
> > > * Add a way to identify groups of threads which frequently share
> resources
> > > (memory and handles) within the group
> > > * Give the thread group as a hint (parameter) to various ODP calls that
> > > create shared resources. Implementation can use the information to
> > > allocate resources "near" to the threads in the group. However, the
> user
> > > is responsible to group the threads and map/pin   those into physical
> CPUs
> > > in a way that enables NUMA aware optimizations.
> > >
> > >
> > > -Petri
> > >
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: lng-odp [mailto:[email protected]] On Behalf
> Of ext
> > > > Gábor Sándor Enyedi
> > > > Sent: Friday, May 08, 2015 10:48 AM
> > > > To: Jerin Jacob; Zoltan Kiss
> > > > Cc: [email protected]
> > > > Subject: Re: [lng-odp] NUMA aware memory allocation?
> > > >
> > > > Hi,
> > > >
> > > > Thanks. So, is the workaround for now to start the threads, and do
> all
> > > > the memory reservation on the thread? And to call odp_shm_reserve()
> > > > instead of simple malloc() calls? Can I use multiple buffer pools,
> one
> > > > for each thread or interface?
> > > > BR,
> > > >
> > > > Gabor
> > > >
> > > > P.s.: Do you know when will this issue in the API be fixed (e.g. in
> next
> > > > release or whatever)?
> > > >
> > > > On 05/08/2015 09:06 AM, Jerin Jacob wrote:
> > > > > On Thu, May 07, 2015 at 05:00:54PM +0100, Zoltan Kiss wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I'm not aware of any such interface, but others with more
> knowledge
> > > can
> > > > >> comment about it. The ODP-DPDK implementation creates buffer
> pools on
> > > > the
> > > > >> NUMA node where the pool create function were actually called.
> > > > > current ODP spec is not NUMA aware. We need to have API to support
> > > nodes
> > > > enumeration and
> > > > > explicit node parameter to alloc/free resource from specific node
> like
> > > > odp_shm_reserve_onnode(node, ...)
> > > > > and while keeping existing API odp_shm_reserve() allocated on node
> > > where
> > > > the current code runs
> > > > >
> > > > >
> > > > >> Regards,
> > > > >>
> > > > >> Zoli
> > > > >>
> > > > >> On 07/05/15 16:32, Gábor Sándor Enyedi wrote:
> > > > >>> Hi!
> > > > >>>
> > > > >>> I just started to test ODP, trying to write my first application,
> > > but
> > > > >>> found a problem: if I want to write NUMA aware code, how should I
> > > > >>> allocate memory close to a given thread? I mean, I know there is
> > > > >>> libnuma, but should I use it? I guess not, but I cannot find
> memory
> > > > >>> allocation functions in ODP. Is there a function similar to
> > > > >>> numa_alloc_onnode()?
> > > > >>> Thanks,
> > > > >>>
> > > > >>> Gabor
> > > > >>> _______________________________________________
> > > > >>> lng-odp mailing list
> > > > >>> [email protected]
> > > > >>>   https://lists.linaro.org/mailman/listinfo/lng-odp
> > > > >> _______________________________________________
> > > > >> lng-odp mailing list
> > > > >> [email protected]
> > > > >>   https://lists.linaro.org/mailman/listinfo/lng-odp
> > > >
> > > >
> > > > _______________________________________________
> > > > lng-odp mailing list
> > > > [email protected]
> > > > https://lists.linaro.org/mailman/listinfo/lng-odp
> > > _______________________________________________
> > > lng-odp mailing list
> > > [email protected]
> > > https://lists.linaro.org/mailman/listinfo/lng-odp
> > >
> > >
> > >
> > >
> > >
> > _______________________________________________
> > lng-odp mailing list
> > [email protected]
> > https://lists.linaro.org/mailman/listinfo/lng-odp
>

_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] NUMA aware memory allocation?

Reply via email to