I defer to Stafford's judgement, but please don't feel an obligation to continue exactly where I left off. If you want to work with FuseSoC only, that should be fine.

A bit of advice in advance -- I didn't write a script to batch run tests on FuseSoC. I instead ran each of the binaries individually by invoking ``fusesoc run``.

On 10/03/2025 06:09, Idzwan Nizam wrote:
Would it be OK and consider a progress if only FuseSoC were used?

On 10/3/2025 1:02 am, el 01 wrote:
Hello,

Hopefully this makes its way onto the mailing list, my previous email didn't.

Stafford's previous email basically covered what I did last summer. I've been dealing with some health issues and haven't been able to consistently document my progress; really sorry about the lack of documentation.

I left off at trying to figure out some (perhaps superficial) differences between measured cycle counts when running a benchmark on LiteX and FuseSoC, two different 'build systems' for the HDL design. This doesn't directly address the discrepancy between marocchino and mor1kx, but was a step along the way.

The build systems bundle the OpenRISC core with some other necessary hardware (e.g. simulated memory, peripherals, etc.) and build either a binary for simulation on your computer, or something which can be put on an FPGA.

When running binaries from the Embench benchmarking suite on the same processor core / simulation engine (and only changing the build system), there are some tests which have substantially different cycle counts.

Some initial data I gathered will be attached, it seems like there are some substantial differences in the cycles required to execute some instructions on LiteX.

I'm also not 100% on whether the measured cycle counts are completely accurate, as the debug / trace parts of the LiteX and FuseSoC are somewhat different.

Another minor thing that I wanted to address was some inefficiency in running LiteX simulations. Because of the way that the Embench testing script for LiteX works (see https://github.com/hhe07/litex-esp/blob/ main/sim.py -- from what I remember this is stuff that you can copy into your Embench install folder to enable compatibility), I think the CPU and some of the supporting software is rebuilt every time a different benchmark is run, which wastes a lot of time.

As for where this fits into the larger issue of the performance discrepancy between mor1kx and marocchino, (in my opinion / experience) I spent a lot of time trying to figure out the tools and determining if what I wanted to do was a feature of a tool or something I needed to figure out. So, I'd recommend trying to understand the tooling and perhaps doing some practice tasks around it. YMMV, though.

I know I haven't really made this problem better due to poor documentation on my part, so please email if you're unsure about something that I did. I'll try to reply ASAP.

As for the attached files:
- profile.ods includes analysis on cycle counts per instruction for one test, I think nettle_sha256. This is for the mor1kx CPU. - results.ods includes cycle counts for all Embench tests run on both FuseSoC and LiteX, and calculated percent differences between the two. - nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of cycle counts from the analysis scripts I wrote (basically same as profile.ods), as well as some additional information on the PCs of the start/end of critical sections in the code, and how many cycles they took to execute.


~ Leo

On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
Hi,

I am interested in OpenRISC Benchmarking and Performance improvements task listed as one of the project ideas in Google Summer of Code. I am unable to participate in GSOC but I would like to contribute to the task gradually as I acquire skills in digital logic and computer architecture.

Is the task still open? I would be glad if you could point me to the right direction such as documentation I should read or tools I have to be familiar with. Any guidance is welcome and greatly appreciated. Thank you.




Reply via email to