I defer to Stafford's judgement, but please don't feel an obligation to
continue exactly where I left off. If you want to work with FuseSoC
only, that should be fine.
A bit of advice in advance -- I didn't write a script to batch run tests
on FuseSoC. I instead ran each of the binaries individually by invoking
``fusesoc run``.
On 10/03/2025 06:09, Idzwan Nizam wrote:
Would it be OK and consider a progress if only FuseSoC were used?
On 10/3/2025 1:02 am, el 01 wrote:
Hello,
Hopefully this makes its way onto the mailing list, my previous email
didn't.
Stafford's previous email basically covered what I did last summer.
I've been dealing with some health issues and haven't been able to
consistently document my progress; really sorry about the lack of
documentation.
I left off at trying to figure out some (perhaps superficial)
differences between measured cycle counts when running a benchmark on
LiteX and FuseSoC, two different 'build systems' for the HDL design.
This doesn't directly address the discrepancy between marocchino and
mor1kx, but was a step along the way.
The build systems bundle the OpenRISC core with some other necessary
hardware (e.g. simulated memory, peripherals, etc.) and build either a
binary for simulation on your computer, or something which can be put
on an FPGA.
When running binaries from the Embench benchmarking suite on the same
processor core / simulation engine (and only changing the build
system), there are some tests which have substantially different cycle
counts.
Some initial data I gathered will be attached, it seems like there are
some substantial differences in the cycles required to execute some
instructions on LiteX.
I'm also not 100% on whether the measured cycle counts are completely
accurate, as the debug / trace parts of the LiteX and FuseSoC are
somewhat different.
Another minor thing that I wanted to address was some inefficiency in
running LiteX simulations. Because of the way that the Embench testing
script for LiteX works (see https://github.com/hhe07/litex-esp/blob/
main/sim.py -- from what I remember this is stuff that you can copy
into your Embench install folder to enable compatibility), I think the
CPU and some of the supporting software is rebuilt every time a
different benchmark is run, which wastes a lot of time.
As for where this fits into the larger issue of the performance
discrepancy between mor1kx and marocchino, (in my opinion /
experience) I spent a lot of time trying to figure out the tools and
determining if what I wanted to do was a feature of a tool or
something I needed to figure out. So, I'd recommend trying to
understand the tooling and perhaps doing some practice tasks around
it. YMMV, though.
I know I haven't really made this problem better due to poor
documentation on my part, so please email if you're unsure about
something that I did. I'll try to reply ASAP.
As for the attached files:
- profile.ods includes analysis on cycle counts per instruction for
one test, I think nettle_sha256. This is for the mor1kx CPU.
- results.ods includes cycle counts for all Embench tests run on both
FuseSoC and LiteX, and calculated percent differences between the two.
- nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of
cycle counts from the analysis scripts I wrote (basically same as
profile.ods), as well as some additional information on the PCs of the
start/end of critical sections in the code, and how many cycles they
took to execute.
~ Leo
On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
Hi,
I am interested in OpenRISC Benchmarking and Performance improvements
task listed as one of the project ideas in Google Summer of Code. I
am unable to participate in GSOC but I would like to contribute to
the task gradually as I acquire skills in digital logic and computer
architecture.
Is the task still open? I would be glad if you could point me to the
right direction such as documentation I should read or tools I have
to be familiar with. Any guidance is welcome and greatly appreciated.
Thank you.