Hi Samuel,
On 29-set-09, at 18:14, Samuel Thibault wrote:
Fawzi Mohamed, le Tue 29 Sep 2009 17:39:17 +0200, a écrit :
so that in the future one could avoid storing it at least in the
deepest levels where it is easy and relatively cheap to generate (and
where one would have the largest savings).
Even the deepest levels would have a L1 cache level on top of maybe
just
at most 4 threads. Here we only save the "children" pointers, which
is
not so many, compared to the siblings & cousins pointers, I'm not sure
it is really worth the pain of defining a long series of functions.
ok those were two separate things, I was thinking
cpuset -> cpuset_ptr (or just a flag that says if the structure has
it, and thus two structures, a long one with it and a short one
without, differing only in the tail if you really want to be hacky).
Then cpuset is generated on the fly for the deepest level (like less
than 4-8 proc -> lots of memory savings on large machines).
(cost 1 function, and copying or building the cpuset)
sibling/cousin -> only cousins (you can make them loop first on
siblings, then to the others if it really is a partition)
children -> only one representation (arity/childrens or first/last)
(cost many functions)
the main point is that these changes/optimizations can be done even
later without breaking anything if you use functions.
I would say that for most operations (cpuset, next_sibling,...) using
functions that get a hwloc_obj_t (and if needed also a topology) and
return what requested is the way to go.
That means a long series of functions, I'm not sure it's really
clearer
for the user. obj->father looks to me easier to read than
hwloc_obj_father(obj), particularly in complex expressions.
ok I can see that, so I guess you will have to evaluate if the
abstraction cost is worth the potential savings, maybe for cpuset it
is; for sibling,... you might be right that it isn't, for father it
sure isn't.
I suppose that most of these operations are not performance critical.
I wouldn't suppose this actually. Detection time is probably not
performance critical, but it could be useful to make browsing the
topology very efficient.
ok, I was thinking that maybe you did/would like to provide in the
future something akin to what opensolaris does with locality groups
http://opensolaris.org/os/community/performance/mpo_overview.pdf
Yes, we intend to provide something similar.
In fact what I "need" (or at least I think I need ;) is just the next
neighbors, basically I go up the hierarchy, and look which new
neighbors I have, so some hierarchy like the lgroups is close to what
I need, and simpler to handle than the full graph.
That's what future heuristics would build for you, yes.
tha's great, I am really looking forward to it.
and sorry if I seem to be criticizing a lot, as I am mainly a user,
not a developer of hwloc, but I hope it is constructive, and maybe
helps making hwloc better...
ciao
Fawzi