> Saku Ytti
> Sent: Tuesday, December 20, 2016 7:22 PM
>
> On 20 December 2016 at 18:42, <[email protected]> wrote:
>
> > Both CRS-X and NCS6k are powered by nPower X1e NPU.
> > And my understanding is that it's Homogeneous(Same PPE type) MPSoC
> i.e. Symmetric MultiProcessing (SMP), much like all the chips out there (used
> in ASR9k or MX and PTX, ...).
> > The difference I understand is in the instruction set that the PPE is
> > running.
> > And my guess is that threads on each PPE are using run to completion
> scheduling.
> > Let me know your thoughts please.
> >
> > And by pipeline with regards to NPU design I understand pipelining of
> arrays of PPEs where each array in the pipeline consists of PPEs dedicated to
> a specific function(parse search modify). -like in ASR9k.
>
> Current gen ASR9k, EZchip, is like Trio, ALU FP or Huawei Solar, many
> identical
> cores, fully programmable, essentially you're only limited by time in what you
> can do. Where as NCS5k/Arista/Jericho, PTX are ASIC/pipelines, with much
> more specialised hardware with lot less flexibility, but what they do do, they
> do far more efficiently, which means denser boxes are pragmatic.
> Roughly speaking pipeline/ASIC is great for core, DC, in Edge you often may
> require richer features offered by NPU designs, and density isn't that
> crucial.
>
With regards to raw processing speed comparison I don't think it matter that
much whether it's an SMP(single PPE completely processes the packet head) or
Pipeline (packet head is processed through a pipeline PPE stages -each
specialized for different function (different instructions set)).
I think what matters the most is how much data does the PPE get (size of packet
head that will be processed) and the amount of instructions in the set (#of
computations/lookups -and resulting memory accesses).
Obviously apart from clock-rate and number of threads for each PPE of course.
A good example is QFP(ASR1K) and QFA(CRS3),
Same SMP architecture, but QFP PPE gets whole packet bodies and executes a
massive instruction set on each resulting in very limited pps performance,
whereas QFA PPE gets only packet heads and executes limited instructions set
resulting in massive improvement of pps performance.
Another good example is the hyper-mode on MX PFE, by reducing the instruction
set that each PPE executes on every packet head it needs to process you gain
some extra pps performance.
What I'm trying to say is that it doesn't matter that much how are the PPEs
organized on the NPU chip (SMP, Pipeline or even SIMD architecture).
adam
Adam Vitkovsky
IP Engineer
T: 0333 006 5936
E: [email protected]
W: www.gamma.co.uk
This is an email from Gamma Telecom Ltd, trading as “Gamma”. The contents of
this email are confidential to the ordinary user of the email address to which
it was addressed. This email is not intended to create any legal relationship.
No one else may place any reliance upon it, or copy or forward all or any of it
in any form (unless otherwise notified). If you receive this email in error,
please accept our apologies, we would be obliged if you would telephone our
postmaster on +44 (0) 808 178 9652 or email [email protected]
Gamma Telecom Limited, a company incorporated in England and Wales, with
limited liability, with registered number 04340834, and whose registered office
is at 5 Fleet Place London EC4M 7RD and whose principal place of business is at
Kings House, Kings Road West, Newbury, Berkshire, RG14 5BY.
---------------------------------------------------------------------------------------
This email has been scanned for email related threats and delivered safely by
Mimecast.
For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------
_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp