Hello,Hopefully this makes its way onto the mailing list, my previous email didn't.
Stafford's previous email basically covered what I did last summer. I've been dealing with some health issues and haven't been able to consistently document my progress; really sorry about the lack of documentation.
I left off at trying to figure out some (perhaps superficial) differences between measured cycle counts when running a benchmark on LiteX and FuseSoC, two different 'build systems' for the HDL design. This doesn't directly address the discrepancy between marocchino and mor1kx, but was a step along the way.
The build systems bundle the OpenRISC core with some other necessary hardware (e.g. simulated memory, peripherals, etc.) and build either a binary for simulation on your computer, or something which can be put on an FPGA.
When running binaries from the Embench benchmarking suite on the same processor core / simulation engine (and only changing the build system), there are some tests which have substantially different cycle counts.
Some initial data I gathered will be attached, it seems like there are some substantial differences in the cycles required to execute some instructions on LiteX.
I'm also not 100% on whether the measured cycle counts are completely accurate, as the debug / trace parts of the LiteX and FuseSoC are somewhat different.
Another minor thing that I wanted to address was some inefficiency in running LiteX simulations. Because of the way that the Embench testing script for LiteX works (see https://github.com/hhe07/litex-esp/blob/main/sim.py -- from what I remember this is stuff that you can copy into your Embench install folder to enable compatibility), I think the CPU and some of the supporting software is rebuilt every time a different benchmark is run, which wastes a lot of time.
As for where this fits into the larger issue of the performance discrepancy between mor1kx and marocchino, (in my opinion / experience) I spent a lot of time trying to figure out the tools and determining if what I wanted to do was a feature of a tool or something I needed to figure out. So, I'd recommend trying to understand the tooling and perhaps doing some practice tasks around it. YMMV, though.
I know I haven't really made this problem better due to poor documentation on my part, so please email if you're unsure about something that I did. I'll try to reply ASAP.
As for the attached files:- profile.ods includes analysis on cycle counts per instruction for one test, I think nettle_sha256. This is for the mor1kx CPU. - results.ods includes cycle counts for all Embench tests run on both FuseSoC and LiteX, and calculated percent differences between the two. - nettle-mor1kx-{fusesoc, litex}-trace-prof include the outputs of cycle counts from the analysis scripts I wrote (basically same as profile.ods), as well as some additional information on the PCs of the start/end of critical sections in the code, and how many cycles they took to execute.
~ Leo On 02/03/2025 21:11, Idzwan Nizam Jamal Abdul Nasir wrote:
Hi, I am interested in OpenRISC Benchmarking and Performance improvements task listed as one of the project ideas in Google Summer of Code. I am unable to participate in GSOC but I would like to contribute to the task gradually as I acquire skills in digital logic and computer architecture. Is the task still open? I would be glad if you could point me to the right direction such as documentation I should read or tools I have to be familiar with. Any guidance is welcome and greatly appreciated. Thank you.
profile.ods
Description: application/vnd.oasis.opendocument.spreadsheet
results.ods
Description: application/vnd.oasis.opendocument.spreadsheet
jal: 1.000000 jump: 1.000000 l.add: 1.066557 l.addi: 1.381047 l.and: 1.015590 l.andi: 1.117647 l.bf: 1.589264 l.bnf: 1.000000 l.jr: 1.750131 l.lbz: 1.000000 l.lhz: 1.000000 l.lwz: 1.148739 l.movhi: 1.004103 l.nop: 1.000000 l.or: 1.003138 l.ori: 1.018006 l.sb: 2.550000 l.sfgtu: 1.593730 l.sll: 1.006010 l.srl: 1.067661 l.sub: 1.000000 l.sw: 1.384018 l.xor: 1.012799 l.xori: 1.000000 nettle update pc: 45f0 -> 45f8 56 -> 189 nettle write digest: 4600 -> 4608 191 -> 10521
jal: 3.624573 jump: 1.040984 l.add: 2.005511 l.addi: 1.331264 l.and: 1.948052 l.andi: 2.000000 l.bf: 1.137044 l.bnf: 2.000000 l.jr: 3.000789 l.lbz: 1.041667 l.lhz: 1.625000 l.lwz: 2.438729 l.movhi: 5.815331 l.nop: 5.398569 l.or: 2.064235 l.ori: 2.064082 l.sb: 2.542857 l.sfgtu: 1.212507 l.sll: 2.140017 l.srl: 2.075332 l.sub: 2.000000 l.sw: 3.226664 l.xor: 2.095170 l.xori: 3.500000 nettle_sha256.update pc: 4000352c: -> 40003530: 59 -> 554 nettle_sha256.write digest: 4000353c: -> 40003540: 558 -> 12034
