Good morning Luke,
> > (to be fair, there were tools to force you to improve coverage by injecting
> > faults to your RTL, e.g. it would virtually flip an `&&` to an `||` and if
> > none of your tests signaled an error it would complain that your test
> > coverage sucked.)
>
> nice!
It should be possible for a tool to be developed to parse a Verilog RTL design,
then generate a new version of it with one change.
Then you could add some automation to run a set of testcases around mutated
variants of the design.
For example, it could create a "wrapper" module that connects to an unmutated
differently-named version of the design, and various mutated versions, wire all
their inputs together, then compare outputs.
If the testcase could trigger an output of a mutated version to be different
from the reference version, then we would consider that mutation covered by
that testcase.
Possibly that could be done with Verilog-2001 file writing code in the wrapper
module to dump out which mutations were covered, then a summary program could
just read in the generated file.
Or Verilog plugins could be used as well (Icarus supports this, that is how it
implements all `$` functions).
A drawback is that just because an output is different does not mean the
testcase actually ***checks*** that output.
If the testcase does not detect the diverging output it could still not be
properly covering that.
The point of this is to check coverage of the tests.
Not sure how well this works with formal validation.
> > Synthesis in particular is a black box and each vendor keeps their
> > particular implementations and tricks secret.
>
> sigh. i think that's partly because they have to insert diodes, and buffers,
> and generally mess with the netlist.
>
> i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!
Well, that surprises me as well.
On the other hand, smaller technologies consistently have lower raw output
current driving capability due to the smaller size, and as trace width goes
down and frequency goes up they stop acting like ideal 0-impedance traces and
start acting more like transmission lines.
So I suppose at some point something like that would occur and I should not
actually be surprised.
(Maybe I am more surprised that it reached that level at that technology size,
I would have thought 33% at 7nm.)
In the modules where we were doing manual netlist+layout, we used inverting
buffers instead (slightly smaller than non-inverrting buffers, in most
technologies a non-inverting buffer is just an inverter followed by an
inverting buffer), it was an advantage of manual design since it looks like
synthesis tools are not willing to invert the contents of intermediate
flip-lfops even if it could give theoretical speed+size advantage to use an
inverting buffer rather than an non-inverting one (it looks like synthesis
optimization starts at the output of flip-flops and ends at their input, so a
manual designer could achieve slightly better performance if they were willing
to invert an intermediate flip-flop).
Another was that inverting latches were smaller in the technology we were using
than non-inverting latches, so it was perfectly natural for us to use an
inverting latch and an inverting buffer on those parts where we needed higher
fan-out (t was equivalent to a "custom" latch that had higher-than-normal
driving capability).
Scan chain test generation was impossible though, as those require flip-flops,
not latches.
Fortunately this was "just" deserialization of high-frequency low-width data
with no transformation of the data (that was done after the deserialization, at
lower clock speeds but higher data width, in pure RTL so flip-flops), so it was
judged acceptable that it would not be covered by scan chain, since scan chain
is primarily for testing combinational logic between flip-flops.
So we just had flip-flops at the input, and flip-flops at the output, and
forced all latches to pass-through mode, during scan mode.
We just needed to have enough coverage to uncover stuck-at faults (which was
still a pain, since additional test vectors slow down manufacturing so we had
to reduce the test vectors to the minimum possible) in non-scan-momde testing.
Man, making ASICs was tough.
>
> plus, they make an awful lot of money, it is good business.
>
> > Pointing some funding at the open-source Icarus Verilog might also fit, as
> > it lost its ability to do synthesis more than a decade ago due to inability
> > to maintain.
>
> ah i didn't know it could do synthesis at all! i thought it was simulation
> only.
Icarus was the only open-source synthesis tool I could find back then, and it
dropped synthesis capability fairly early due to maintenance burden (I never
managed to get the old version with synthesis compiled and never managed actual
synthesis on it, so my knowledge of it is theoretical).
There is an argument that open-source software is not truly open-source unless
it can be compiled by open-source compilers or executed by open-source
interpreters.
Similarly, I think open-source hardware RTL designs are not truly open-source
if there are no open-source synthesis tools that can synthesize it to netlist
and then lay it out.
Icarus can interpret most Veriog RTL designs, though.
However, at the time I left, I had already mandated that new code should use
`always_comb` and `always_ff` (previously I had mandated that new code should
use `always @*` for combinational logic) and was encouraging my subordinates to
use loops and `generate`.
Icarus did not support `always_comb` and `always_ff` at the time (though worked
perfectly fine with loops and `generate`).
In addition, at the time, we (actually just me in that company haha) were
dabbling in object-oriented testing methodologies (which Icarus has no plans on
ever implementing, which is understandable since it is a massive increase in
complexity, it is much much harder than the scheduling shenanigans of
`always_comb` and the "just treat it as `always`" of `always_ff`).
(Particularly, you need object-oriented testbenches since SystemVerilog
includes a fuzz-testing framework to randomize fields of objects according to
certain engineer-provided constraints, and then you would use those object
fields to derive the test vectors your test framework would feed into the DUT,
this was a massive increase in code coverage for a largish up-front cost but
once you built the test framework you could just dump various constraints on
your test specification objects, I actually caught a few bugs that we would not
have otherwise found with our previous checklist-based testing methodology.)
(Unfortunately it turned out that it required a more expensive license and I
ended up hogging the only one we had of that more expensive license (which, if
I remember correctly, was the same license needed for formal verification of
netlist<->RTL equivalence) for this, which killed enthusiasm for this
technique, sigh, this is another argument for getting open-source hardware
design tools developed; not much sense in having open-source RTL for a crypto
device if you have to pay through the nose for a license just to synthesize it,
never mind the manufacturing cost.)
-----------------------
Another point to ponder is test modes.
In mass production you **need** test modes.
There will always be some number of manufacturing defects because even the
cleanest of cleanrooms *will* have a tiny amount of contaminants (what can go
wrong will go wrong).
Test modes are run in manufacturing to filter out chips with failing circuitry
due to contamination.
Now, a typical way of implementing test modes is to have a special command sent
over, say, the "normal" serial port interface of a chip, which then enters
various test modes to allow, say, scan chain testing.
Of course, scan chain testing is done by pushing test vectors into all
flip-flops, and then the test is validated by pulsing global clock once (often
the test mode forces all flip-flops on the same clock), then pulling data from
all flip-flops to verify that all the circuitry works as designed.
The "pulling data from all flip-flops" is of course just another way of saying
that all mass-produced chips have a way of letting ***anyone*** exfiltrate data
from their flip-flops via test modes.
Thus, for a secure environment, you need to ensure that test modes cannot be
entered after the device enters normal operation.
For example, you might have a dedicated pad which is normally pulled-down, but
if at reset it is pulled up, the device enters test mode.
If at reset the pad is pulled down, the device is in normal mode and even if
the pad is pulled up afterwards the device will not enter test mode.
This ensures that only reset data can be read from the device, without
possibility of exfiltration of sensitive (key material or midstate) data.
The pad should also not be exposed as a package pinout except perhaps on DS and
ES packages, and the pulldown resistor has to be on-chip.
As an additional precaution, we can also create a small secure memory (maybe
256 octet addressable would be more than enough).
It is possible to exempt flip-flops from scan chain generation (usually by
explicitly instantiating flip-flops in a separate module and telling
post-synthesis tools to exempt the module from scan chain synthesis).
This gives an extra layer of protection against test mode accessing sensitive
data; even if we manage to screw up test mode and it is possible to force reset
on the test mode circuit without resetting the rest of the design, sensitive
data is still out of the scan chain.
Of course, since they are not on scan, it is possible they have undetectable
manufacturing defects, so you would need to use some kind of ECC, or better
triple-redundancy best-of-three, to protect against manufacturing defects on
the non-scan flip-flops.
Fortunately non-scan flip-flops are often a good bit smaller than scan
flip-flops, so the redundancy is not so onerous.
Since the ECC / best-of-three circuit itself would need to be tested, you would
multiplex their inputs, in normal mode they get inputs from the non-scan-chain
flip-flops, in test mode they get inputs from separate scan-chain flip-flops,
so that the ECC / best-of-three circuit is testable at scan mode.
You would also need a separate test of the secure memory, this time running in
normal mode with a special test program in the CPU, just in case.
Finally, you would explicitly lay them out "distributed" around the chip, since
manufacturing defects tend to correlate in space (they are usually from dust,
and dust particles can be large relative to cell size), you do not want all
three of the best-of-three to have manufacturing defects.
For example, you could have a 256 x 8 non-scan-chain flip-flop module,
instantiate three of those, and explicitly place them in corners of the digital
area, then use a best-of-three circuit to resolve the "correct" value.
The test mode circuit itself could ensure that the device enters test mode if
and only if the secure memory contains all 0 data after the test mode circuit
is reset.
For example, the 256 x 8 non-scan-chain flip-flop module could have a large OR
circuit that ORs all the flip-flops, then outputs a single bit that is the
bitwise OR of all the flip-flop contents.
Then the test mode circuit gets the `in_use` outputs fo the three secure
flip-flop modules, and if at reset any of them are `1` then it will refuse to
enter test mode even if the test mode pad is pulled high.
This ensures that even if an attacker is somehow able to reset *only* the test
mode circuit somehow (this is basic engineering, always assume something will
go wrong), if the secure memory has any non-0 data (we presume it resets to 0),
the device will still not enter test mode.
Of course, if the secure memory itself is accessible from the CPU, then it
remains possible that a CPU program is reading from the secure area, keeping
raw data in CPU registers, from which a test-mode might be able to extract if
the device is somehow forced into test mode even after normal mode.
You could redesign your implementations of field multiplication and SHA
midstate computation so that they directly read from the secure memory and
write to the secure memory without using any flip-flops along the way, and have
only the cryptographic circuit have access to the secure memory.
That way there is reduced possibility that intermediate flip-flops (that are
part of the scan chain) outside the secure memory having sensitive key material
or midstate data.
You would need to use a custom bus with separate read and write addresses, and
non-pipelined unbuffered access, and since you want to distribute your secure
memory physically distant, that translates to wide and long buses (it might be
better to use 64 x 32 or 32 x 64 addressable memories, to increase what the
cryptographic circuit has access to per clock cycle) screwing with your layout,
and probably having to run the secure memory + crypto circuit at a ***much***
slower clock domain (but more secure is a good tradeoff for slowness).
Of course, that is a major design headache (the crypto circuit has to act
mostly as a reduced-functionality processor), so you might just want to have
the CPU directly access the secure memory and in early boot poke a `0x01` in
some part of the memory, in the hope that the `in_use` flag in the previous
paragraph is enough to suppress test modes from exfiltrating CPU registers.
Do note that with enough power-cycles and ESD noise you can put digital
circuitry into really weird and unexpected states (seen it happen, though
fairly hard to replicate, we had an ESD gun you could point at a chip to make
it go into weird states), so being extra paranoid about test modes is important.
What can go wrong will go wrong!
In particular with "`TESTMODE_PAD` is only checked at reset" you would have to
store `TESTMODE` in a non-scan flip-flop, and with enough targeted ESD that
flip-flop can be jostled, setting `TESTMODE` even after normal operation.
You might instead want to use, say, a byte pattern instead of a single bit to
represent `TESTMODE`, so the `TESTMODE` register has to have a specific value
such as `0xA5`, so that targeted ESD has to be very lucky in order to force
your device into test mode.
For example, since you need to check the `TESTMODE` pad at reset anyway, you
could do something like this:
input CLK, RESET_N, TESTMODE_PAD, IN_USE0, IN_USE1, IN_USE2;
output reg TESTMODE;
wire in_use = IN_USE0 || IN_USE1 || IN_USE2;
reg [7:0] testmode_ff;
wire [7:0] next_testmode_ff =
(testmode_ff == 8'hA5 || testmode_ff == 8'h00) ?
(TESTMODE_PAD && !in_use) ? 8'hA5 :
/*otherwise*/ 8'h5A :
/*otherwise*/ testmode_ff ;
always_ff @(posedge CLK, negedge RESET_N) begin
if (!RESET_N) testmode_ff <= 0x00;
else testmode_ff <= next_testmode_ff; end
wire next_TESTMODE = (testmode_ff == 8'hA5);
always_ff @(posedge CLK, negedge RESET_N) begin
if (!RESET_N) TESTMODE <= 1'b0;
else TESTMODE <= next_TESTMODE; end
Do note that the `TESTMODE` is a flip-flop, since you do ***not*** want
glitches on the `TESTMODE` signal line, it would be horribly unsafe to output
it from combinational circuitry directly, please do not do that.
Of course that flip-flop can instead be the target of ESD gunnery, but since
you need many clock pulses to read the scan chain, it should with good
probability also get set to `0` on the next clock pulse and leave test mode
(and probably crash the device as well until full reset, but this "fails safe"
since at least sensitive data cannot be extracted).
`TESTMODE` has no feedback, thus cannot be stuck in a state loop.
`testmode_ff` *can* be stuck in a state loop, but that is deliberate, as it
would "fail safe" if it gets a value other than `0xA5`, it would not enter test
mode (and if it enters `0xA5` it can easily leave test mode by either
`TESTMODE_PAD` or `in_use`).
(Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeatedly,
but this risks also flipping other scan flip-flops that contain the data that
is being extracted, so this might be sufficient protection in practice.)
If you are really going to open-source the hardware design then the layout is
also open and attackers can probably target specific chip area for ESD pulse to
try a flip-flop upset, so you need to be extra careful.
Note as well that even closed-source "secure" elements can be
reverse-engineered (I used to do this in the IC design job as a junior
engineer, it was the sort of shitty brain-numbing work forced on new hires), so
security-by-obscurity does have a limit as well, it should be possible to try
to figure out the testmode circuitry on "secure" elements and try to get
targeted ESD upsets at flip-flops on the testmode circuit.
Test mode design is something of an arcane art, especially if you are trying to
build a security device, on the one hand you need to ensure you deliver devices
without manufacturing defects, on the other hand you need to ensure that the
test mode is not entered inadvertently by strange conditions.
In general, because test modes are such a pain to deal with securely, and are
an absolute necessity for mass production, you should assume that any "secure"
chip can be broken by physical access and shooting short-range ESD pulses at it
to try to get it into some test mode, unless it is openly designed to prevent
test mode from persisting after entering normal mode, as above.
(No idea how that ESD gun thing worked or what it was formally called, we just
called it the ESD gun, it was an amusing toy, you point it at the DUT and pull
the trigger and suddenly it would switch modes, this of course was a bad thing
since you want to make sure that as much as possible such upsets do not cause
the chip to enter an irrecoverable mode but an amusing thing to do still, we
even had small amounts of flash memory containing register settings that we
would load into the settings registers periodically at the end of each display
frame to protect against this kind of ESD gun thing since the flip-flops
backing the settings registers were vulnerable to it and we needed a way to
preserve the settings of the customer for the IC, the expected effect would be
to cause the display to flicker.)
Regards,
ZmnSCPxj
_______________________________________________
bitcoin-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev