On 29-set-09, at 19:18, Samuel Thibault wrote:

Samuel Thibault, le Wed 23 Sep 2009 23:51:30 +0200, a écrit :
Also, dynamic-size cpuset_t is actually more efficient for small boxes
for most operations, as the bitmask will be smaller.

As raised in another thread, dynamic-size cpuset_t could also permit a
sparse implementation for really big boxes (100 000 cores).

that was me, sorry I was not aware of this thread...

Just to be clear: I'm not concerned by the ABI we choose right now, as I
believe recompiling to get better support is not a problem for people.
I'm concerned by the needed API changes, i.e. providing functions to
allocate/copy/destroy cpusets so that later ABI changes don't require a
change in the API.

well for large machines probably one doesn't need the full granularity, there will be some basic hierarchy that can be used so that the cpuset will remain small.
something like

numa_nodes_bitmap  proc_in_first_selected_numa_node_bitmap

would still mostly allow quick bitmap comparisons (one can also think other encoding schemes).

such an approach would probably make the bit setting/testing functions more complex, but not overly so.

One could even always have a compressing/uncompressing function if the full granularity is really needed in some occasions.

So probably you are right in saying that one can avoid allocating/ deallocating functions in most cases.

Still programmatically building the cpuset for the low lying nodes by looping on the processors is going to be fast, so maybe giving a function
        hwloc_cpuset_t hwloc_get_cpuset(hwloc_obj_t)
or
        int hwloc_get_cpuset(hwloc_obj_t,       hwloc_cpuset_t *)
is a good idea (so that in the future some hwloc_obj_t might avoid storing the set).

The other way to really use little memory is indeed as you say to allocate memory as needed and use a sparse set. In that would allow maximum granularity, and still small memory usage at the proc level. but then the api needs alloc/free, that as you point out make the interface more complex and uglier.

It comes down to what you want to have, if you think you might want to go the sparse full granularity way then indeed alloc/copy/free should be added

Fawzi

Reply via email to