[Gnucap-devel] gnucap profiling

gserdyuk Thu, 29 Jan 2009 15:07:43 -0800

Hello All,

This is part of of discussion regarding GNUCAP profiling. It was begun
in email but then decided that it may be interesting to wide auditory.

[email protected] on 29-Jan-2009 wrote:
====================================================================

Fast MOSFET Model Implementation Proposal for Gnucap.
-----------------------------------------------------

This document is done in the course of evaluation of idea to substitute complex 
MOSET model with much
simpler one and use it for simulation of digital circuits.

1. Code profiling
To understand where most time is spent some profiling was made. Profiling was 
made using own Gnucap timers and
such tools like gprof (GNU profiler) [1], oprof [2], sysprof [3].

Tests was made on BICMOS circuit made of 400 MOSFETS, simulation time 20 and 
200 ns, that is around 40 and 400
periods.

Simulation time, steps, iteration and elative times are presented in the Table 
1.

Table 1.        Gnucap Timers and Counters
------------------------------------------------------------------------------
value               sim time, sec         steps successful       steps total
Sim duration, ns    20        200         20         200          20    200
embedded model      49.92     742.93      1107       15158        1114  15220
bsim model          65.96     648.87      191        1856         353   3576
-------------------------------------------------------------------------------
Value               itrations             time per iter, sec      time/step,sec
Sim duration, ns    20        200         20         200          20    200
embedded model      6235      82995       0.008006   0.008952     0.044  0.048
bsim model          5207      50578       0.012668   0.012829     0.186  0.181
------------------------------------------------------------------------------

>From result it's visible that short time profiling (20 ns) is similar
to long  term (200 ns) so short one can also be used for profiling.
Iteration time for both embedded and bsim model are
the same (similar) and number of iterations is similar too (comparable).

Meanwhile number of steps differs significantly.

Open Question: Al - may be you could comment a few words about that -
why is  that difference.

Profiling results (selected functions) are presented at Table 2 (measured by 
system timer)

Table 2(a): Profiling results, Embedded Model 
-------------------------------------------------------------------------------------
Model:                embedded model  20ns              embedded model  200ns
sampling time         57.86                             779.595
                      samples   rel_time  abs_time      samples  rel_time  
abs_tm
gnucap                95.50%    100%      55.2563       96.57%   100.00%   
752.85
| sweep               93.59%    98%       54.15117      95.95%   99.36%    
748.02
|| sim::solve         88.19%    92%       51.02673      90.75%   93.97%    
707.48
||| sim:solve_equat   11.69%    12%       6.763834      11.97%   12.40%    
93.317
||| sim:load matrix   11.31%    12%       6.543966      11.30%   11.70%    
88.094
||| sim::advance_time 5%        5%        2.893         5.13%    5.31%     
39.993
||| sim::eval._models 51.20%    54%       29.62432      57.48%   59.52%    
448.11
|||| DEV_MOS..do_it   51.20%    54%       29.62432      53.71%   55.62%    
418.72
||||| MOS8::tr_eval   28.70%    30%       16.60582      29.99%   31.06%    
233.80
|||| CARD_LIST::do_tr 20.31%    21%       11.75137      21.14%   21.89%    
164.80
---------------------------------------------------------------------------------

Table 2(b): Profiling results, BSIM3 Model
-------------------------------------------------------------------------------------
Model:                bsim3 20ns                     Bsim3 200ns
sampling time         76.305                         693.67
                      samples  rel_time   abs_time      samples  rel_time  
abs_time
gnucap                95.60%   100.00%    72.94758      97.56%   100.00%   
676.7445
| sweep               92.39%   96.64%     70.49819      96.90%   99.32%    
672.1662
|| sim:: solve        91.79%   96.01%     70.04036      96.20%   98.61%    
667.3105
||| sim:: solve_equat 9.61%    10.05%     7.332911      9.67%    9.91%     
67.07789
||| sim::load matrix  18.60%   19.46%     14.19273      22.40%   22.96%    
155.3821
||| sim::advance_time 0.26%    0.27%      0.198393      0.26%    0.27%     
1.803542
||| sim::eval._model  61.37%   64.19%     46.82838      61.89%   63.44%    
429.3124

||||DEV_SPICE::do_tr  57.30%   59.94%     43.72277      57.25%   58.68%    
397.1261
||||| BSIMload        44.40%   46.44%     33.87942      44.15%   45.25%    
306.2553

2. Analysis
Embedded model
Sweep (main simulation loop) takes >97% of the time, i.e. overhead
related to data processing  is small enough and can be neglected.

Around 50% of time takes SIM::evaluate_models() regardless of the
simulation length. Indeed,  that means that if we'll improve model
infinitely (and evaluation  time will be =0), speed gain will be
around twice. This is  at the best, real implementation (whatever it
will be) anyway  will take some computations.

There is no internal nodes for the model, so simplifying models we can
not gain from node  number reduction.

Bsim model
For BDIM model considerations are pretty much the same.
SIM::evaluate_models() takes >60% of time.  In there most significant
time takes  BSIMload (45%), so expected speedup will be around 2 times
as in previous  case.

3. Implementation 
As a simplest implementation approach can propose just to add
completely new simplistic MOSFET model to  Gnucap with own parameter
set. Conversion of BSIM parameters  to these simplistic model
parameters could be done externally from  Gnucap code, at or before netlist
generation step.

So simulation will look like:
1.Convert BSIM parameters to MOS_SIMPLE model parameters.
2.Generate netlist with MOS_SIMPLE code.
3.Simulate.

At the next step it would be possible to ember model transformation
into Gnucap (if Al will consider that reasonable).

OpenQuestion: Al √ do you think it is possible to insert to gnucap
model-converion  capabilities or should that be external to Gnucap ?

This could give 50% speed up or so. To further improve speed it is
necessary to  change computation model:
a)Implement different time-steps in different parts of circuit (this
can be  done in same "continuous time" computation model, but quite
complex  to implement AFAIK.
b)Switch to clocked discrete time model (unsure yet about that).
c)use even-driven simulation (like IRSIM), that will require
"adapters" to  plug event-based part to "continuous time" and back.
.
Refernces
[1] Gprof home page. http://www.gnu.org/software/binutils/binutils.html,
http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html
[2] Oprofile home page. http://oprofile.sourceforge.net/news/
[3] Sysprof home page. http://www.daimi.au.dk/~sandmann/sysprof/

====================================================================

Al Davis <[email protected]> on 30-Jan-2009 wrote:

====================================================================

#1 -- code profiling -
 What version of gnucap (and the models) are you using?  There 
are some significant changes in the 12-23 snapshot.

This is serious.  If your tests were made with an older version, 
they are no longer relevant.

What the numbers tell me is that the time step control in the 
Spice BSIM model is inadequate.  

353 steps, 191 successful, tells me that there were 162 steps 
rejected.  Almost as many steps were rejected as accepted.  
Step rejections should be rare.  They usually indicate some 
kind of problem.  They always indicate wasted time.

In contrast, 1114 steps, 1107 successful, tells me that 7 steps 
were rejected.  That's less than 1%.  The "embedded" model uses 
the step control associated with the internal components, which 
is quite strict.  As a result, there were very few rejections.

The iteration count is even more revealing.  The "embedded" 
model gives you an average of 5.6 iterations per step.  
Considering that the algorithm needs one extra for checking, 
and defaults to one extra as insurance, that means it usually 
converges in 3 or 4 steps.  That is good.  It means that on 
every time step only a minor trim is needed.

The gives you an average of 14.7 iterations per step, or 27 
iterations per accepted step.  A per-step iteration count this 
high says that convergence is not easy.  Considering the number 
of rejections, and that the default settings allow 20 
iterations per step, that tells me that often it was failing to 
converge, then reducing the time step and trying again.  
Clearly, the time steps taken were too large.

#2 -- analysis

If all you do is speed up model evaluation, you save 50%.  That 
is misleading.  If you try running with "option nobypass" you 
might see a bigger difference.  Using a simpler model should 
also reduce the iteration count and allow bigger time steps.

There already is a simple model.  It's called "level 1".

Aside from that, it would be interesting to know how the 
options "nobypass", "notraceload", "noincmode" change the 
results.

#3 -- implementation

Use "level 1".

You could make a new model with modelgen that is derived from 
level 1, that adds parameter mapping.

If you want something even simpler, start with level 1.  Make 
the capacitors linear.  Eliminate code related to "lambda", 
essentially setting lambda=0.  If you do that, add a fixed 
parallel resistor so you don't get "open circuit".  Simplify 
the diode.  A two-region piecewise linear model may work well.

Whether a model is embedded or a plugin has no impact on speed.  
All of the embedded models are designed as plugins.  Therefore, 
you should assume that all new models will be plugins.

There is overhead associated with the "spice-wrapper" which maps 
data structures.  It is probably not significant with a big 
model like a BSIM, but probably very significant with simple 
models like a level-1.

a) different time step in different parts of the circuit ... 

This is difficult and very experimental.  I don't know, without 
trying it, what the benefit would be.  The situation now is 
that although the time step is global, models see it as local, 
and iteration is local.  If part of the circuit takes many 
iterations and another part takes few, iteration on the part 
that takes few steps stops when there is local convergence.
So, you do get some of the expected benefit now.

When you have extra time steps, the iteration count per time 
step is usually reduced, because each step has a closer 
starting point.

b) clocked discrete time model???  ....  I'm not sure how that 
would help.

c) event driven ...  it already sort of is.

Other ideas ...

It should be significantly faster to use "Euler" 
differentiation.  For reduced accuracy fast simulations, Euler 
is preferred.  Euler time stepping ignores traditional 
truncation error, and becomes completely based on events and 
dv/dt.

"Gear" doesn't work right with the spice-wrapper.  It is fine 
anywhere else.  This will be fixed.  I think it substitutes 
Euler now.

Even if "Gear" did work there, it would not be the best choice 
for high speed simulations of digital circuits.  Euler is.
====================================================================

Al Davis <[email protected]> on 3-Jan-2008 wrote:

====================================================================
> Aside from that, it would be interesting to know how the
> options "nobypass", "notraceload", "noincmode" change the
> results.

Doing this will make it slower, maybe by a lot.  It would be 
interesting to see by how much.
====================================================================

-- 
Best regards,
 gserdyuk                          mailto:[email protected]

_______________________________________________
Gnucap-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnucap-devel

[Gnucap-devel] gnucap profiling

Reply via email to