Hi,
Lately, I have had access to a computer with the Cadence RTL Compiler and the
TSMC 90nm standard cell libraries to play with, so by curiosity I tried to run
synthesis of the LatticeMico32 core.
The most interesting result is that it's RIDICULOUSLY FAST. It nearly meets
timing at 800MHz, which is 7-8 times the speed on Virtex-4.
Power consumption is 29mW only at this frequency. Area is very small, with
only 13K cells used (0.081 square millimeters).
If I did not do a mistake using the synthesizer (since it gets approximately
the same number of flip-flops as in the FPGA implementation it's probably
correct) and if these results are for real, they definitely make me want to
leave FPGAs and do ASICs instead :)
The LM32 configuration is the same as the one used on ML401, except that I
disabled the caches because the synthesizer apparently does not support RAM
extraction and generated a mess of flip-flops instead.
Those results were obtained from the gate-level netlist only, with a wire load
model. I did not try to lay out the core in silicon yet.
Attached are synthesis script + some reports from the tool.
Sébastien
============================================================
Generated by: Encounter(R) RTL Compiler v07.10-s021_1
Generated on: Dec 15 2009 12:05:26 AM
Module: lm32_top
Technology library: tcbn90gtc 110
Operating conditions: NCCOM (balanced_tree)
Wireload mode: segmented
============================================================
Instance Cells Cell Area Net Area Wireload
--------------------------------------------------------------------
lm32_top 12554 81171 0 TSMC32K_Lowk_Conservative (S)
(S) = wireload was automatically selected
============================================================
Generated by: Encounter(R) RTL Compiler v07.10-s021_1
Generated on: Dec 15 2009 12:05:27 AM
Module: lm32_top
Technology library: tcbn90gtc 110
Operating conditions: NCCOM (balanced_tree)
Wireload mode: segmented
============================================================
Clock Description
-----------------
Clock Clock Source No of
Name Period Rise Fall Domain Pin/Port Registers
-------------------------------------------------------------------
ideal_clock 1250.0 0.0 625.0 domain_1 clk_i 2135
Clock Network Latency / Setup Uncertainty
-----------------------------------------
Network Network Source Source Setup Setup
Clock Latency Latency Latency Latency Uncertainity Uncertainity
Name Rise Fall Rise Fall Rise Fall
------------------------------------------------------------------------------
ideal_clock 0.0 0.0 0.0 0.0 0.0 0.0
Clock Relationship (with uncertainity & latency)
-----------------------------------------------
From To R->R R->F F->R F->F
--------------------------------------------------------------
ideal_clock ideal_clock 1250.0 625.0 625.0 1250.0
set_attribute lib_search_path /usr/cadence/extra_libraries/standard_cell/TSMC/tcbn90g_110a/Front_End/timing_power/tcbn90g_110a
set_attribute hdl_search_path rtl/
set_attribute stdout_log lm32.log
set_attribute library tcbn90gtc.lib
set_attribute wireload_selection WireAreaLowkCon
read_hdl -v2001 lm32_cpu.v
read_hdl -v2001 lm32_instruction_unit.v
read_hdl -v2001 lm32_decoder.v
read_hdl -v2001 lm32_load_store_unit.v
read_hdl -v2001 lm32_adder.v
read_hdl -v2001 lm32_addsub.v
read_hdl -v2001 lm32_logic_op.v
read_hdl -v2001 lm32_shifter.v
read_hdl -v2001 lm32_multiplier.v
read_hdl -v2001 lm32_mc_arithmetic.v
read_hdl -v2001 lm32_interrupt.v
read_hdl -v2001 lm32_ram.v
read_hdl -v2001 lm32_icache.v
read_hdl -v2001 lm32_dcache.v
read_hdl -v2001 lm32_top.v
elaborate lm32_top
ungroup -flatten -all
define_clock -period 1250 -name ideal_clock [list clk_i]
synthesize -to_mapped
report timing > timing.rep
report design > design.rep
report summary > summary.rep
report area > area.rep
report gates > gates.rep
report clocks > clock.rep
report clocks -ideal > clockideal.rep
report clocks -generated > clockgen.rep
report nets > nets.rep
report power > power.rep
quit
============================================================
Generated by: Encounter(R) RTL Compiler v07.10-s021_1
Generated on: Dec 15 2009 12:05:40 AM
Module: lm32_top
Technology library: tcbn90gtc 110
Operating conditions: NCCOM (balanced_tree)
Wireload mode: segmented
============================================================
Leakage Dynamic Total
Instance Cells Power(nW) Power(nW) Power(nW)
----------------------------------------------------
lm32_top 12554 171522.302 28388789.619 28560311.921
Setting attribute of root '/': 'stdout_log' = lm32.log
Warning : Unusable clock gating integrated cell. [LBR-101]
: Clock gating integrated cell name: 'CKLNQD20'.
: To use the cell in clock gating, Set cell attribute 'dont_use' false in the library.
Warning : Unusable clock gating integrated cell. [LBR-101]
: Clock gating integrated cell name: 'CKLNQD24'.
Setting attribute of root '/': 'library' = tcbn90gtc.lib
Setting attribute of root '/': 'wireload_selection' = /libraries/tcbn90gtc/wireload_selections/WireAreaLowkCon
3'ha
|
Warning : Truncation in sized number. [VLOGPT-16]
: 3'ha in file 'rtl/lm32_cpu.v' on line 2064, column 8.
: The number of bits specified is larger than the number of declared bits, e.g. 3'b1001. In this case, the resulting number will be pruned to 3'b001 which may not be the intent of the user.
initial
|
Warning : Ignoring unsynthesizable construct. [VLOGPT-37]
: Initial in file 'rtl/lm32_cpu.v' on line 2702, column 7.
: The following constructs will be ignored:
- initial block
- final block
- program block
- property block
- sequence block
- covergroup
- gate drive strength
- system task enable
- reg declaration with initial value
- specify block.
if ((size_m === 2'b11) && (load_store_address_m[0] !== 1'b0))
|
Warning : Using synthesizable equivalent of non-synthesizable operator. [VLOGPT-107]
: Converting '===' to '==' in file 'rtl/lm32_load_store_unit.v' on line 785, column 21.
: Verilog operators === and !== are not synthesizable.
if ((size_m === 2'b11) && (load_store_address_m[0] !== 1'b0))
|
Warning : Using synthesizable equivalent of non-synthesizable operator. [VLOGPT-107]
: Converting '!==' to '!=' in file 'rtl/lm32_load_store_unit.v' on line 785, column 60.
if ((size_m === 2'b10) && (load_store_address_m[1:0] !== 2'b00))
|
Warning : Using synthesizable equivalent of non-synthesizable operator. [VLOGPT-107]
: Converting '===' to '==' in file 'rtl/lm32_load_store_unit.v' on line 787, column 21.
if ((size_m === 2'b10) && (load_store_address_m[1:0] !== 2'b00))
|
Warning : Using synthesizable equivalent of non-synthesizable operator. [VLOGPT-107]
: Converting '!==' to '!=' in file 'rtl/lm32_load_store_unit.v' on line 787, column 62.
Elaborating top-level block 'lm32_top' from file 'rtl/lm32_top.v'.
Warning : Unreachable statements for case item. [CDFG-472]
: Case item 'default' in module 'lm32_cpu' in file 'rtl/lm32_cpu.v' on line 1531.
Warning : Removing unused register. [CDFG-508]
: Removing unused register 'x_result_sel_logic_x' in module 'lm32_cpu' in file 'rtl/lm32_cpu.v' on line 2229.
: A flip-flop or latch that was inferred for an unused signal or variable was removed. Use 'set_attribute hdl_preserve_unused_registers true /' to preserve the flip-flop or latch.
Warning : Removing unused register. [CDFG-508]
: Removing unused register 'eret_m' in module 'lm32_cpu' in file 'rtl/lm32_cpu.v' on line 2229.
Warning : Removing unused register. [CDFG-508]
: Removing unused register 'direction_m' in module 'lm32_cpu' in file 'rtl/lm32_cpu.v' on line 2229.
Done elaborating 'lm32_top'.
Warning : Unknown command. [TUI-501]
: Embedded command 'attribute' cannot be evaluated in file 'rtl/lm32_cpu.v' at line 595
Warning : Unknown command. [TUI-501]
: Embedded command 'attribute' cannot be evaluated in file 'rtl/lm32_cpu.v' at line 596
Warning : Unknown command. [TUI-501]
: Embedded command 'attribute' cannot be evaluated in file 'rtl/lm32_interrupt.v' at line 109
Deleting 30 sequential instances. They do not transitively
drive any primary outputs.
Mapping lm32_top to gates.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_lock_o_reg'.
: This optimization was enabled by the root attribute 'optimize_constant_0_flops'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_cti_o_reg[2]'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_cti_o_reg[1]'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_cti_o_reg[0]'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_adr_o_reg[1]'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_instruction_unit_i_adr_o_reg[0]'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_load_store_unit_d_lock_o_reg'.
Info : Replacing a flip-flop with a logic constant 0. [GLO-12]
: The instance is 'cpu_mc_arithmetic_state_reg[2]'.
Info : Replacing a flip-flop with a logic constant 1. [GLO-13]
: The instance is 'cpu_load_store_unit_d_cti_o_reg[2]'.
: This optimization was enabled by the root attribute 'optimize_constant_1_flops'.
Info : Replacing a flip-flop with a logic constant 1. [GLO-13]
: The instance is 'cpu_load_store_unit_d_cti_o_reg[1]'.
Info : Replacing a flip-flop with a logic constant 1. [GLO-13]
: The instance is 'cpu_load_store_unit_d_cti_o_reg[0]'.
Deleting 11 sequential instances. They do not transitively
drive any primary outputs.
Global mapping target info
==========================
Cost Group 'default' target slack: 37 ps
Target path end-point (Pin: cpu_instruction_unit_pc_f_reg[23]/d)
Global mapping status
=====================
Worst
Total Neg
Operation Area Slack Worst Path
-------------------------------------------------------------------------------
global_map 99190 -50 cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[24]/D
fine_map 87540 -40 cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[27]/D
area_map 86111 -46 cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[27]/D
area_map 85668 -43 cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[27]/D
area_map 85407 -43 cpu_operand_0_x_reg[13]/CP -->
cpu_operand_1_x_reg[31]/D
Incremental optimization status
===============================
Worst - - DRC Totals - -
Total Neg Max Max
Operation Area Slack Trans Cap
-------------------------------------------------------------------------------
init_delay 85407 -43 0 0
Path: cpu_operand_0_x_reg[13]/CP --> cpu_operand_1_x_reg[31]/D
incr_delay 86106 -28 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[27]/D
incr_delay 86546 -21 0 0
Path: cpu_multiplier_multiplier_reg[15]/CP -->
cpu_multiplier_product_reg[29]/D
incr_delay 86605 -21 0 0
Path: cpu_operand_1_x_reg[6]/CP --> cpu_store_operand_x_reg[31]/D
incr_delay 86859 -17 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[29]/D
incr_delay 87013 -15 0 0
Path: cpu_multiplier_multiplier_reg[3]/CP -->
cpu_multiplier_product_reg[31]/D
incr_delay 87147 -13 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[31]/D
incr_delay 87173 -13 0 0
Path: cpu_multiplier_muliplicand_reg[5]/CP -->
cpu_multiplier_product_reg[29]/D
incr_delay 87257 -12 0 0
Path: cpu_multiplier_multiplier_reg[5]/CP -->
cpu_multiplier_product_reg[31]/D
incr_delay 87282 -11 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[31]/D
incr_delay 87331 -11 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[29]/D
init_drc 87331 -11 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[29]/D
init_area 87331 -11 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[29]/D
rem_buf 86375 -11 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[29]/D
rem_inv 85628 -11 0 0
Path: cpu_multiplier_multiplier_reg[13]/CP -->
cpu_multiplier_product_reg[27]/D
merge_bi 83020 -11 0 0
Path: cpu_multiplier_multiplier_reg[7]/CP -->
cpu_multiplier_product_reg[31]/D
glob_area 81874 -11 0 0
Path: cpu_multiplier_multiplier_reg[23]/CP -->
cpu_multiplier_product_reg[31]/D
area_down 81267 -11 0 0
Path: cpu_multiplier_multiplier_reg[2]/CP -->
cpu_multiplier_product_reg[30]/D
rem_buf 81161 -11 0 0
Path: cpu_multiplier_multiplier_reg[2]/CP -->
cpu_multiplier_product_reg[30]/D
rem_inv 81044 -11 0 0
Path: cpu_multiplier_multiplier_reg[2]/CP -->
cpu_multiplier_product_reg[30]/D
merge_bi 80942 -11 0 0
Path: cpu_multiplier_multiplier_reg[2]/CP -->
cpu_multiplier_product_reg[30]/D
Incremental optimization status
===============================
Worst - - DRC Totals - -
Total Neg Max Max
Operation Area Slack Trans Cap
-------------------------------------------------------------------------------
init_delay 80942 -11 0 0
Path: cpu_multiplier_multiplier_reg[2]/CP -->
cpu_multiplier_product_reg[30]/D
incr_delay 81149 -9 0 0
Path: cpu_multiplier_multiplier_reg[9]/CP -->
cpu_multiplier_product_reg[31]/D
incr_delay 81484 -7 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[30]/D
incr_delay 81625 -6 0 0
Path: cpu_operand_1_x_reg[1]/CP -->
cpu_multiplier_muliplicand_reg[27]/D
incr_delay 81771 -4 0 0
Path: cpu_multiplier_multiplier_reg[5]/CP -->
cpu_multiplier_product_reg[29]/D
incr_delay 81823 -4 0 0
Path: cpu_instruction_unit_pc_d_reg[2]/CP -->
cpu_branch_target_x_reg[30]/D
incr_delay 81835 -3 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[30]/D
init_drc 81835 -3 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[30]/D
init_area 81835 -3 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[30]/D
rem_buf 81765 -3 0 0
Path: cpu_multiplier_multiplier_reg[1]/CP -->
cpu_multiplier_product_reg[30]/D
rem_inv 81696 -3 0 0
Path: cpu_multiplier_multiplier_reg[21]/CP -->
cpu_multiplier_product_reg[28]/D
merge_bi 81616 -3 0 0
Path: cpu_multiplier_multiplier_reg[21]/CP -->
cpu_multiplier_product_reg[28]/D
glob_area 81303 -3 0 0
Path: cpu_multiplier_multiplier_reg[21]/CP -->
cpu_multiplier_product_reg[28]/D
area_down 81171 -3 0 0
Path: cpu_multiplier_multiplier_reg[21]/CP -->
cpu_multiplier_product_reg[28]/D
Done mapping lm32_top
Synthesis succeeded.
Warning : Possible timing problems have been detected in this design. [TIM-11]
: The design is 'lm32_top'.
: Use 'report timing -lint' for more information.
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkym...@freenode
Webchat: www.milkymist.org/irc.html
Wiki: www.milkymist.org/wiki