Re: [gentoo-user] Mobo/proc combination

Boyd Stephen Smith Jr. Mon, 13 Mar 2006 10:02:26 -0800

On Monday 13 March 2006 04:44, Glenn Enright <[EMAIL PROTECTED]> wrote 
about 'Re: [gentoo-user] Mobo/proc combination':
> On Monday 13 March 2006 21:47, Boyd Stephen Smith Jr. wrote:
> > Hyper-Transport is a way for CPUs to exchange data directly rather
> > than going through a memory controller, thus allowing limited
> > resources (L1/2/3 cache) to be used more effectively.  In particular,
> > process migration causes fewer cache misses.
> >
> > Hyper-Threading is a way for a CPU to pretend to be two, thus causing
> > the system to request/require more resources than are available.
> >
> > Hyper-Transport attempts to alleviate a bottleneck, while
> > Hyper-Threading increases the load on an existing one.
>
> I apreciate that AMD certainly seem to have the memory
> bandwidth/throughput thing nailed, and their processors stand tall as a
> result. but I doubt that a p4 would perform near as well without a large
> part of the enginered paralelism that comes as part HThreading, compared
> to a purely serial system.


My characterization is mostly correct.  However, there are mitigating 
circumstances on the P4.  What happened is that Intel made the instruction 
pipeline so long, they were losing a /large/ number of cycles when they 
had to flush the pipeline.  The pipeline basically has to be flushed 
anytime the branch predictor guesses wrong.

With HT there's a separate pipeline that can be independently filled and 
flushed, sharing the same compute devices, this allows the chip to 
continue processing unless both branch predictors go wrong.  For certain 
pair of processes this will increase performance because the (on-chip) 
scheduler overhead is overtaken by the reducing of wasted cycle due to 
single-pipeline flush.

The additional pipeline is a good idea, but I think it would have been 
better used as a parallel pipeline so that the branch predictor /is/ the 
scheduler.  When a branch is encountered, the pipeline is duplicated and 
the branch the predictor chose is given priority for compute resources. A 
branch can only go two ways so one of the pipelines is correct and a bad 
guess by the branch predictor will only flush one.

The problem with my approach [1] is handling the case when the pipeline 
ends up having multiple branch instructions in it.  Then, you don't have 
enough pipelines to do all the branches simultaneously and you can run 
into the same issues of having to flush all the pipelines.

I'm sure the designers at Intel weighed my approach against the HT approach 
taken and found their approach superior given the size of the pipeline and 
statistics available for branch instruction probability.  I just hope that 
HT was the winner because of technical superiority and not because they 
couldn't find a cool name for my approach.  [My choice: 
Quantum-Prediction.]

In short, going with H-Thr vs. non-H-Thr will probably buy you performance, 
but it could be the wrong solution to a problem that shouldn't have 
existed in the first place.  (The first thing you learn about pipelines is 
the flush overhead associated with long ones.)

-- 
"If there's one thing we've established over the years,
it's that the vast majority of our users don't have the slightest
clue what's best for them in terms of package stability."
-- Gentoo Developer Ciaran McCreesh

[1] I call it "my approach" for two reasons: a) I wrote it in the email and 
2) I can't remember the actual name of it; it's been used before and may 
be used currently, it's not original or anything.
-- 
[email protected] mailing list

Re: [gentoo-user] Mobo/proc combination

Reply via email to