Problem: sugar starts up slowly on the xo

Tests were ran on an xo after a clean joyride-2181 install plus
sucrose's activities.

A graph of the ending part of the boot process can be found at:
http://dev.laptop.org/~rlucchese/boot/boot.stats.svg
(http://dev.laptop.org/~rlucchese/boot/boot.stats)

It was obtained by running:
$ picker -t 75 -o /tmp/boot.stats &
$ grapher -i boot.stats -c10

The sampling period begins in `start()' in haldaemon's init script and
lasts 75 seconds.

The following tab shows cpu usage of the 10 processes that took more cpu
time during the sampling time:
$ grapher -i boot.stats -c13 -r cpu

 tot%   ps%     cmdline
 -----------------------
        27.7    /bin/sh /usr/bin/sugar
 44.6   16.8    jffs2_gcd_mtd0
 51.7   7.1     /usr/bin/env python /usr/bin/sugar-activity journal...
 56.2   4.5     python /usr/bin/datastore-service
 60.6   4.4     picker -t75 -o /tmp/boot.stats
 64.3   3.8     python /usr/sbin/rainbow-daemon --daemon
 68.0   3.6     xinit /usr/bin/olpc-session -- /usr/bin/X -fp built-ins...
 71.3   3.3     /bin/sh /usr/bin/olpc-session
 74.1   2.8     python /usr/bin/sugar-shell-service
 76.9   2.8     python /usr/bin/sugar-presence-service
 79.7   2.8     python /usr/sbin/rainbow-daemon --daemon
 81.2   1.5     hald
 82.6   1.4     ohmd

Renicing jffs2_gcd_mtd0 to 19 in haldaemon's init script slightly
speeds-up the last
part of the boot:
$ cat /home/olpc/.boot_time.prev  (clean joyride)
56.27

$ cat /home/olpc/.boot_time
54.14

Note however that when renicing jffs2_gcd_mtd0, boot timings `became
less deterministic, sometimes being slower than the `clean install'
case. Stopwatch timings (from boot loader to fully redrawn shell):
clean install : 1m34.0s
renice trick  : 1m30.2s +[0. to 4.5 sec]

Would it be possible to make DS, sugar-presence-service and
sugar-shell-service system services and thus decoupling them from the
shell's start-up?



The next tab shows the 10 processes that hit more major faults during
the sampling time:

$ grapher -i boot.stats -m6 -r faults

 minflt majflt  cmdline
 -----------------------
 15363  154     /bin/sh /usr/bin/sugar
 3122   57      python /usr/sbin/rainbow-daemon --daemon
 1829   51      xinit /usr/bin/olpc-session -- /usr/bin/X -fp built-ins -wr
-aut
 3576   38      python /usr/bin/datastore-service
 498    37      ohmd
 1270   16      hald

A strace of `dbus-launch ... sugar-shell' in /usr/bin/sugar can be found
at:
http://dev.laptop.org/~rlucchese/boot/dbus-launch_sugar-shell.strace

A script was run that cat'ed all files (5700+) that sugar tries to open
on boot. (http://dev.laptop.org/~rlucchese/boot/cat_sugar_startup_files)

$ time sh cat_sugar_startup_files
real 0m16.9s
user 0m1.0s
sys  0m12.2s

Timings vary a lot when repeating the test (+-~30%).
`top' shows that the remaining time goes to io-wait.

Why don't those files get cached when repeating the test ?



cProfile statistics (KCG format) for sugar-shell (start-up only):
http://dev.laptop.org/~rlucchese/boot/cProfile-shell

Functions ordered by self-time (%):
 32.2   gtk.main()
 19.6   send_message_with_reply_and_block of dbus
 12.8   block of dbus.lowlevel.PendingCall
  7.3   grab of sugar._sugarext.KeyGrabber
  3.    _add_weight of sugar/shell/view/home/grid.py
  2.7  render_cairo of rsvg.Handle
  1.12 __getitem__ of sugar/shell/view/home/grid.py
---------------------------------------------------
 78.7 %
 
Perhaps some dbus calls can be made asynchronous; this is taking 30+% of
start-up time.

It's curious that `grab` takes so much time: I don't see anything
obvious looking at the code.

Note that the layout for the favorite's view was set to the ring type;
why is the shell computing weights in this case?


High level tasks/functions ordered by total-time:
 67.3   gtk.main()
 33.6   __init__ of sugar/view/Shell.py
 16.8   _start_journal_idle of sugar/view/Shell.py
 11.6   __init__ shell/view/keyhandler.py
  7.5   add sugar/shell/view/home/spreadlayout.py

Note that 45% of Shell.__init__ and the 89% of _start_journal_idle
went to blocking dbus calls.



The next tab shows start-up timings extracted from the statistics
gathered with picker (the sampler may have missed processes lasting
shorter than the sampling period):

http://dev.laptop.org/~rlucchese/boot/sugar_xo_boot_timings

$ grapher -i boot.stats  -r timings,cpu | `a bit of selection

 tot%   ps%     start   length  cmdline
 --------------------------------------
 16.9   16.8    0.00    74.99   jffs2_gcd_mtd0
  ..
 36.5   3.8     12.44   62.55   python /usr/sbin/rainbow-daemon --daemon
  ..
 37.8   0.1     18.88   56.11   /usr/sbin/olpc-dm
  ..
 38.0   0.0     19.32   55.67   /bin/sh /usr/bin/startx /usr/bin/...
  ..
 41.7   3.6     19.87   55.12   xinit /usr/bin/olpc-session -- ...
  ..
 46.0   3.3     24.27   50.72   /bin/sh /usr/bin/olpc-session
  ..
 74.0   27.7    28.24   46.75   /bin/sh /usr/bin/sugar
  ..
 75.1   0.4     49.15   25.84   matchbox-window-manager -use_titlebar...
  ..
 77.9   2.8     49.25   25.74   python /usr/bin/sugar-presence-service
  ..
 81.2   2.8     53.86   21.13   python /usr/bin/sugar-shell-service
  ..
 85.7   4.5     58.89   16.10   python /usr/bin/datastore-service
  ..
 92.8   7.1     62.86   12.13   usr/bin/env python /usr/... (journal)



Accurate shell's start-up timings were taken by instrumentation:
---first launch (normal boot)---
/usr/bin/sugar                              : (1217317352.992) --  0.
entering main() in shell/main.py            : (1217317368.162) -- 15.2
before entering gtk's mainloop shell/main.py: (1217317375.464) -- 22.5

---second launch (ctrl+alt+erase)---
/usr/bin/sugar                              : (1217317725.747) --  0.
entering main() in shell/main.py            : (1217317734.939) --  9.2
before entering gtk's mainloop shell/main.py: (1217317741.023) -- 15.3



The datastore service and the journal will be profiled in separate
sessions.

Thanks,

riccardo

_______________________________________________
Devel mailing list
[email protected]
http://lists.laptop.org/listinfo/devel

Reply via email to