Hi folks. I measured the time overhead of the HDLCD device in an ARM linux
boot using a root filesystem I made with buildroot and which runs m5 exit
in an early-ish boot script, after some essential services have started but
before it brings up any UI. DPRINTF output shows that 9-10 frames are
rendered (20Hz display refresh rate) during the boot when the HDLCD is
enabled. I ran each configuration 3 times to settle out disk caching of the
gem5 binary, etc.

Non-caching atomic CPU with HDLCD disabled in the device tree:

real    3m13.576s
user    3m13.251s
sys     0m0.155s

real    3m9.975s
user    3m9.663s
sys     0m0.160s

real    3m9.743s
user    3m9.436s
sys     0m0.150s

Average real time: 3m11.098s


Non-caching atomic CPU with HDLCD enabled, frame capture disabled:

real    4m0.399s
user    4m0.037s
sys     0m0.166s

real    3m56.798s
user    3m56.471s
sys     0m0.137s

real    4m2.224s
user    3m59.596s
sys     0m2.171s

Average real time: 3m59.807s


Non-caching atomic CPU with HDLCD and frame capture enabled:

real    3m59.711s
user    3m59.332s
sys     0m0.180s

real    4m3.611s
user    4m3.242s
sys     0m0.167s

real    3m58.932s
user    3m58.574s
sys     0m0.153s

Average real time: 4m0.751s


Non-caching atomic CPU with HDLCD and frame capture enabled, VNC attached:

real    3m57.977s
user    3m57.484s
sys     0m0.276s

real    3m58.452s
user    3m57.963s
sys     0m0.268s

real    3m57.840s
user    3m57.327s
sys     0m0.275s

Average real time: 3m58.090s


Non-caching atomic CPU with HDLCD and frame capture enabled, VNC attached
and VNC frame capture enabled

real    3m56.129s
user    3m55.679s
sys     0m0.251s

real    3m56.902s
user    3m56.476s
sys     0m0.227s

real    3m58.611s
user    3m58.139s
sys     0m0.214s

Average real time: 3m57.214s


So, it looks like enabling the HDLCD device at all comes with roughly a 20%
overhead in performance. Enabling VNC and various frame capture mechanisms
don't seem to make much of any difference since the number of frames is
small, and there isn't *that* much work converting framebuffer memory into
an image or sending it to VNC. If other aspects of the simulation were
faster (binary translating CPU, for instance), the number of frames would
go up, and these other mechanisms may start mattering more, although so
would the generic overhead, and so that would probably still dominate.

I'm not sure what exactly the HDLCD is doing that's so inefficient that it
takes almost a full second more to handle about 9 frames worth of image
data. The resolution is only HD, so there isn't a huge amount of data to
gather/process. I imagine there must be some low hanging fruit as far as
efficiency improvement, which I'm hoping to find in the near future.

Note also that this is with the non-caching atomic CPU, so this will
trigger less accurate, more efficient behavior out of the HDLCD model. If
we were using the regular atomic CPU, it would be even worse.

Gabe
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to