from the few articles I've read on hyperthreading (in partilcular this one
http://msdn.microsoft.com/en-us/magazine/cc300701.aspx). Each logical core
can have two concurrent instruction streams and if one is waiting for a
resource (i.e. either in use by another thread or accessing the main
memory) then the other instruction stream continues while the first stream
is blocked.

But because I use independent board states for each thread and the total
memory used is less than my L3 cache size, none of my threads wait long for
access to main memory or contend with each other so they don't easily give
up control to the second stream. Also, because I'm using .NET, the CLR
arranges the memory in such a way that concurrent threads won't (or will be
less likely) to cross the cache line resulting in less resource contention
in the L1/L2/L3 caches.

Unless the C/C++ compiler optimizes for a CPU with a specific number of
logical cores and physical cores, I doubt it would be as efficient as the
compiled .NET code in regards to how the memory is mapped to the L1/L2/L3
cache?

This is all new to me, so I'm likely wrong but feel like sharing my
thoughts :)




On Mon, May 12, 2014 at 9:10 PM, Matthew Woodcraft
<matt...@woodcraft.me.uk>wrote:

> Mikko Aarnos wrote:
> > There is a big difference here: Ellis's program can only do light
> > playouts. He doesn't have MCTS or patterns. That is parallelized
> > extremely simply by just giving each thread an internal board state,
> > doing a playout from that, resetting the board state to the
> > original, doing a playout etc. There are no bottlenecks there, and
> > that shouldn't get any increase in performance from HT as far as I
> > know(also see the first sentence of Schmicker's comment).
>
> I don't think that's right.
>
> I tried an experiment once with hyperthreading and 'light playouts' and
> I got a 40% improvement from using two threads per core.
>
>
> There are plenty of bottlenecks even in such simple code.
>
> For example, any time you do something equivalent to following a linked
> list (eg, finding the stones in a group that you're joining to another
> group) the thread will have to wait three or four cycles per 'link' even
> if all the data is in level-1 cache.
>
>
> One way to tell whether code is likely to benefit from hyperthreading is
> to use a tool that reports the processor's performance counters and look
> at the 'instructions per clock' measure. If it's somewhere around 1 then
> there are excellent chances of getting good results from hyperthreading.
>
> -M-
> _______________________________________________
> Computer-go mailing list
> Computer-go@dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to