Hello All, This is part of of discussion regarding GNUCAP profiling. It was begun in email but then decided that it may be interesting to wide auditory.
[email protected] on 29-Jan-2009 wrote: ==================================================================== Fast MOSFET Model Implementation Proposal for Gnucap. ----------------------------------------------------- This document is done in the course of evaluation of idea to substitute complex MOSET model with much simpler one and use it for simulation of digital circuits. 1. Code profiling To understand where most time is spent some profiling was made. Profiling was made using own Gnucap timers and such tools like gprof (GNU profiler) [1], oprof [2], sysprof [3]. Tests was made on BICMOS circuit made of 400 MOSFETS, simulation time 20 and 200 ns, that is around 40 and 400 periods. Simulation time, steps, iteration and elative times are presented in the Table 1. Table 1. Gnucap Timers and Counters ------------------------------------------------------------------------------ value sim time, sec steps successful steps total Sim duration, ns 20 200 20 200 20 200 embedded model 49.92 742.93 1107 15158 1114 15220 bsim model 65.96 648.87 191 1856 353 3576 ------------------------------------------------------------------------------- Value itrations time per iter, sec time/step,sec Sim duration, ns 20 200 20 200 20 200 embedded model 6235 82995 0.008006 0.008952 0.044 0.048 bsim model 5207 50578 0.012668 0.012829 0.186 0.181 ------------------------------------------------------------------------------ >From result it's visible that short time profiling (20 ns) is similar to long term (200 ns) so short one can also be used for profiling. Iteration time for both embedded and bsim model are the same (similar) and number of iterations is similar too (comparable). Meanwhile number of steps differs significantly. Open Question: Al - may be you could comment a few words about that - why is that difference. Profiling results (selected functions) are presented at Table 2 (measured by system timer) Table 2(a): Profiling results, Embedded Model ------------------------------------------------------------------------------------- Model: embedded model 20ns embedded model 200ns sampling time 57.86 779.595 samples rel_time abs_time samples rel_time abs_tm gnucap 95.50% 100% 55.2563 96.57% 100.00% 752.85 | sweep 93.59% 98% 54.15117 95.95% 99.36% 748.02 || sim::solve 88.19% 92% 51.02673 90.75% 93.97% 707.48 ||| sim:solve_equat 11.69% 12% 6.763834 11.97% 12.40% 93.317 ||| sim:load matrix 11.31% 12% 6.543966 11.30% 11.70% 88.094 ||| sim::advance_time 5% 5% 2.893 5.13% 5.31% 39.993 ||| sim::eval._models 51.20% 54% 29.62432 57.48% 59.52% 448.11 |||| DEV_MOS..do_it 51.20% 54% 29.62432 53.71% 55.62% 418.72 ||||| MOS8::tr_eval 28.70% 30% 16.60582 29.99% 31.06% 233.80 |||| CARD_LIST::do_tr 20.31% 21% 11.75137 21.14% 21.89% 164.80 --------------------------------------------------------------------------------- Table 2(b): Profiling results, BSIM3 Model ------------------------------------------------------------------------------------- Model: bsim3 20ns Bsim3 200ns sampling time 76.305 693.67 samples rel_time abs_time samples rel_time abs_time gnucap 95.60% 100.00% 72.94758 97.56% 100.00% 676.7445 | sweep 92.39% 96.64% 70.49819 96.90% 99.32% 672.1662 || sim:: solve 91.79% 96.01% 70.04036 96.20% 98.61% 667.3105 ||| sim:: solve_equat 9.61% 10.05% 7.332911 9.67% 9.91% 67.07789 ||| sim::load matrix 18.60% 19.46% 14.19273 22.40% 22.96% 155.3821 ||| sim::advance_time 0.26% 0.27% 0.198393 0.26% 0.27% 1.803542 ||| sim::eval._model 61.37% 64.19% 46.82838 61.89% 63.44% 429.3124 ||||DEV_SPICE::do_tr 57.30% 59.94% 43.72277 57.25% 58.68% 397.1261 ||||| BSIMload 44.40% 46.44% 33.87942 44.15% 45.25% 306.2553 2. Analysis Embedded model Sweep (main simulation loop) takes >97% of the time, i.e. overhead related to data processing is small enough and can be neglected. Around 50% of time takes SIM::evaluate_models() regardless of the simulation length. Indeed, that means that if we'll improve model infinitely (and evaluation time will be =0), speed gain will be around twice. This is at the best, real implementation (whatever it will be) anyway will take some computations. There is no internal nodes for the model, so simplifying models we can not gain from node number reduction. Bsim model For BDIM model considerations are pretty much the same. SIM::evaluate_models() takes >60% of time. In there most significant time takes BSIMload (45%), so expected speedup will be around 2 times as in previous case. 3. Implementation As a simplest implementation approach can propose just to add completely new simplistic MOSFET model to Gnucap with own parameter set. Conversion of BSIM parameters to these simplistic model parameters could be done externally from Gnucap code, at or before netlist generation step. So simulation will look like: 1.Convert BSIM parameters to MOS_SIMPLE model parameters. 2.Generate netlist with MOS_SIMPLE code. 3.Simulate. At the next step it would be possible to ember model transformation into Gnucap (if Al will consider that reasonable). OpenQuestion: Al √ do you think it is possible to insert to gnucap model-converion capabilities or should that be external to Gnucap ? This could give 50% speed up or so. To further improve speed it is necessary to change computation model: a)Implement different time-steps in different parts of circuit (this can be done in same "continuous time" computation model, but quite complex to implement AFAIK. b)Switch to clocked discrete time model (unsure yet about that). c)use even-driven simulation (like IRSIM), that will require "adapters" to plug event-based part to "continuous time" and back. . Refernces [1] Gprof home page. http://www.gnu.org/software/binutils/binutils.html, http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html [2] Oprofile home page. http://oprofile.sourceforge.net/news/ [3] Sysprof home page. http://www.daimi.au.dk/~sandmann/sysprof/ ==================================================================== Al Davis <[email protected]> on 30-Jan-2009 wrote: ==================================================================== #1 -- code profiling - What version of gnucap (and the models) are you using? There are some significant changes in the 12-23 snapshot. This is serious. If your tests were made with an older version, they are no longer relevant. What the numbers tell me is that the time step control in the Spice BSIM model is inadequate. 353 steps, 191 successful, tells me that there were 162 steps rejected. Almost as many steps were rejected as accepted. Step rejections should be rare. They usually indicate some kind of problem. They always indicate wasted time. In contrast, 1114 steps, 1107 successful, tells me that 7 steps were rejected. That's less than 1%. The "embedded" model uses the step control associated with the internal components, which is quite strict. As a result, there were very few rejections. The iteration count is even more revealing. The "embedded" model gives you an average of 5.6 iterations per step. Considering that the algorithm needs one extra for checking, and defaults to one extra as insurance, that means it usually converges in 3 or 4 steps. That is good. It means that on every time step only a minor trim is needed. The gives you an average of 14.7 iterations per step, or 27 iterations per accepted step. A per-step iteration count this high says that convergence is not easy. Considering the number of rejections, and that the default settings allow 20 iterations per step, that tells me that often it was failing to converge, then reducing the time step and trying again. Clearly, the time steps taken were too large. #2 -- analysis If all you do is speed up model evaluation, you save 50%. That is misleading. If you try running with "option nobypass" you might see a bigger difference. Using a simpler model should also reduce the iteration count and allow bigger time steps. There already is a simple model. It's called "level 1". Aside from that, it would be interesting to know how the options "nobypass", "notraceload", "noincmode" change the results. #3 -- implementation Use "level 1". You could make a new model with modelgen that is derived from level 1, that adds parameter mapping. If you want something even simpler, start with level 1. Make the capacitors linear. Eliminate code related to "lambda", essentially setting lambda=0. If you do that, add a fixed parallel resistor so you don't get "open circuit". Simplify the diode. A two-region piecewise linear model may work well. Whether a model is embedded or a plugin has no impact on speed. All of the embedded models are designed as plugins. Therefore, you should assume that all new models will be plugins. There is overhead associated with the "spice-wrapper" which maps data structures. It is probably not significant with a big model like a BSIM, but probably very significant with simple models like a level-1. a) different time step in different parts of the circuit ... This is difficult and very experimental. I don't know, without trying it, what the benefit would be. The situation now is that although the time step is global, models see it as local, and iteration is local. If part of the circuit takes many iterations and another part takes few, iteration on the part that takes few steps stops when there is local convergence. So, you do get some of the expected benefit now. When you have extra time steps, the iteration count per time step is usually reduced, because each step has a closer starting point. b) clocked discrete time model??? .... I'm not sure how that would help. c) event driven ... it already sort of is. Other ideas ... It should be significantly faster to use "Euler" differentiation. For reduced accuracy fast simulations, Euler is preferred. Euler time stepping ignores traditional truncation error, and becomes completely based on events and dv/dt. "Gear" doesn't work right with the spice-wrapper. It is fine anywhere else. This will be fixed. I think it substitutes Euler now. Even if "Gear" did work there, it would not be the best choice for high speed simulations of digital circuits. Euler is. ==================================================================== Al Davis <[email protected]> on 3-Jan-2008 wrote: ==================================================================== > Aside from that, it would be interesting to know how the > options "nobypass", "notraceload", "noincmode" change the > results. Doing this will make it slower, maybe by a lot. It would be interesting to see by how much. ==================================================================== -- Best regards, gserdyuk mailto:[email protected] _______________________________________________ Gnucap-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnucap-devel
