Re: Performance improvements of Marocchino implementation

el 01 Tue, 11 Mar 2025 13:00:46 -0700

I defer to Stafford's judgement, but please don't feel an obligation tocontinue exactly where I left off. If you want to work with FuseSoConly, that should be fine.

A bit of advice in advance -- I didn't write a script to batch run testson FuseSoC. I instead ran each of the binaries individually by invoking``fusesoc run``.


On 10/03/2025 06:09, Idzwan Nizam wrote:

Would it be OK and consider a progress if only FuseSoC were used?

On 10/3/2025 1:02 am, el 01 wrote:
Hello,
Hopefully this makes its way onto the mailing list, my previous emaildidn't.
Stafford's previous email basically covered what I did last summer.I've been dealing with some health issues and haven't been able toconsistently document my progress; really sorry about the lack ofdocumentation.
I left off at trying to figure out some (perhaps superficial)differences between measured cycle counts when running a benchmark onLiteX and FuseSoC, two different 'build systems' for the HDL design.This doesn't directly address the discrepancy between marocchino andmor1kx, but was a step along the way.
The build systems bundle the OpenRISC core with some other necessaryhardware (e.g. simulated memory, peripherals, etc.) and build either abinary for simulation on your computer, or something which can be puton an FPGA.
When running binaries from the Embench benchmarking suite on the sameprocessor core / simulation engine (and only changing the buildsystem), there are some tests which have substantially different cyclecounts.
Some initial data I gathered will be attached, it seems like there aresome substantial differences in the cycles required to execute someinstructions on LiteX.
I'm also not 100% on whether the measured cycle counts are completelyaccurate, as the debug / trace parts of the LiteX and FuseSoC aresomewhat different.
Another minor thing that I wanted to address was some inefficiency inrunning LiteX simulations. Because of the way that the Embench testingscript for LiteX works (see https://github.com/hhe07/litex-esp/blob/main/sim.py -- from what I remember this is stuff that you can copyinto your Embench install folder to enable compatibility), I think theCPU and some of the supporting software is rebuilt every time adifferent benchmark is run, which wastes a lot of time.
As for where this fits into the larger issue of the performancediscrepancy between mor1kx and marocchino, (in my opinion /experience) I spent a lot of time trying to figure out the tools anddetermining if what I wanted to do was a feature of a tool orsomething I needed to figure out. So, I'd recommend trying tounderstand the tooling and perhaps doing some practice tasks aroundit. YMMV, though.
I know I haven't really made this problem better due to poordocumentation on my part, so please email if you're unsure aboutsomething that I did. I'll try to reply ASAP.
As for the attached files:
- profile.ods includes analysis on cycle counts per instruction forone test, I think nettle_sha256. This is for the mor1kx CPU.- results.ods includes cycle counts for all Embench tests run on bothFuseSoC and LiteX, and calculated percent differences between the two.- nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs ofcycle counts from the analysis scripts I wrote (basically same asprofile.ods), as well as some additional information on the PCs of thestart/end of critical sections in the code, and how many cycles theytook to execute.
~ Leo

On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
Hi,
I am interested in OpenRISC Benchmarking and Performance improvementstask listed as one of the project ideas in Google Summer of Code. Iam unable to participate in GSOC but I would like to contribute tothe task gradually as I acquire skills in digital logic and computerarchitecture.
Is the task still open? I would be glad if you could point me to theright direction such as documentation I should read or tools I haveto be familiar with. Any guidance is welcome and greatly appreciated.Thank you.

Re: Performance improvements of Marocchino implementation

Reply via email to