Github user PSUdaemon commented on a diff in the pull request:

    https://github.com/apache/trafficserver/pull/956#discussion_r77461624
  
    --- Diff: iocore/eventsystem/UnixEventProcessor.cc ---
    @@ -152,6 +155,59 @@ EventProcessor::start(int n_event_threads, size_t 
stacksize)
     #else
           Debug("iocore_thread", "EThread: %d %s: %d", i, obj_name, 
obj->logical_index);
     #endif // HWLOC_API_VERSION
    +    }
    +#endif // TS_USE_HWLOC
    +
    +    snprintf(thr_name, MAX_THREAD_NAME_LENGTH, "[ET_NET %d]", i);
    +#if TS_USE_HWLOC
    +    if (obj_count > 0) {
    +      hwloc_membind_policy_t mem_policy = HWLOC_MEMBIND_DEFAULT;
    +      hwloc_nodeset_t nodeset           = hwloc_bitmap_alloc();
    +      int num_nodes                     = 0;
    +
    +      hwloc_cpuset_to_nodeset(ink_get_topology(), obj->cpuset, nodeset);
    +      num_nodes = 
hwloc_get_nbobjs_inside_cpuset_by_type(ink_get_topology(), obj->cpuset, 
HWLOC_OBJ_NODE);
    +
    +      if (num_nodes == 1) {
    +        mem_policy = HWLOC_MEMBIND_BIND;
    +      } else if (num_nodes > 1) {
    +        mem_policy = HWLOC_MEMBIND_INTERLEAVE;
    +      }
    +
    +      if (mem_policy != HWLOC_MEMBIND_DEFAULT) {
    +        hwloc_set_membind_nodeset(ink_get_topology(), nodeset, mem_policy, 
HWLOC_MEMBIND_THREAD);
    +      }
    --- End diff --
    
    So each thread has a CPU set that we convert into a node set. Node in this 
case is a NUMA memory node. So the likely case is that it returns one node. But 
if you chose to bind your threads to the `machine` or `system` in 
`records.config` then it might cover multiple nodes. In which case we want to 
interleave.
    
    I don't think this depends at all on how `malloc(3)` chooses to allocate. 
It's all about what memory the OS (kernel/libc) returns for a given thread. So 
we tell the OS, please give this thread memory from a specific NUMA node.
    
    Each thread will likely be on a different node so we need to look at the 
CPU set for each thread. The PU's are also usually interleaved. so PU0 might be 
on socket 0 but PU1 is on socket 1 and then PU2 is back on socket 0. So a 
reasonable CPU set might be PU{0,2,4,6} on a dual socket quad core system. And 
that might then reasonably translate to NUMA node 0.
    
    `BIND` means used the specified node. So when one CPU set falls into one 
NUMA node (the likely and preferred case) then we tell it to just use that one 
node.
    
    Also, I will be happy to comment this up a bit more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to