Here are some rough thoughts:
* numCores() has been criticized as saying too much about a specific
processor architecture -- i.e., what does it mean on a GPU, on an
XMT, etc.? I agree with this, but also like the idea of being able
to make nonportable queries like this for locale types that logically
support them because I think it's a nice clear way to think about the
HW on which you're running. I think the main downside is maintenance
cost, but maybe I'm forgetting some other argument against doing so...
* To address the architecture-specificity of it, I think we should have
a more general query that tells the "natural" degree of parallelism
supported by the hardware. A verbose name for it might be something
like degreeOfHardwareParallelism() but I'm not actually suggesting we
use that. The idea is that this would be portable across all locale
types and report something reasonable.
* Assuming numCores() persists as proposed in bullet #1, I think it should
talk about the number of physical cores, not including hyperthreading.
Then, I'd add an additional call to query the degree of virtual
oversubscription. Then the user can combine these values as they wish.
* In such a world, I'd imagine that the default value of
dataParTasksPerLocale (and other such values) would either inherit the
value from bullet 2; or it might heuristically determine a value based
on other settings as well. For instance, for Qthreads programs, we've
seen better performance if dataParTasksPerLocale == number of physical
processors as compared to the number of virtual processors taking
hyperthreading into account. So perhaps the qthreads tasking layer
would set dataParTasksPerLocale to numCores() while fifo sets it to
2*numCores() (assuming it sees benefit from that).
OK, tear this apart -- I'm sure I've got plenty of exposed mistakes,
-Brad
On Fri, 9 May 2014, Ben Harshbarger wrote:
Hi all,
I ran into an issue with hyper threading on OSX this morning, and after a
discussion with Brad some questions were posed that seem better suited for
a group discussion.
Currently numCores (at least on linux) includes the count for
hyper-threading. This introduces some ambiguity in the meaning of
‘here.numCores’ (physical vs. logical), and raises some questions:
- What should numCores represent?
- Should we separate the notion of physical cores and the amount of
parallelism in hardware (new variable/name)?
- What does this mean for dataParTasksPerLocale?
Hopefully this gets the ball rolling.
-Ben
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers