On 8/9/23 07:02, Paul Koning wrote:


On Aug 9, 2023, at 2:32 AM, Alexander Monakov <amona...@ispras.ru> wrote:


On Tue, 8 Aug 2023, Jeff Law wrote:

If the compiler can identify a CRC and collapse it down to a table or clmul,
that's a major win and such code does exist in the real world. That was the
whole point behind the Fedora experiment -- to determine if these things are
showing up in the real world or if this is just a benchmarking exercise.

Can you share the results of the experiment and give your estimate of what
sort of real-world improvement is expected? I already listed the popular
FOSS projects where CRC performance is important: the Linux kernel and
a few compression libraries. Those projects do not use a bitwise CRC loop,
except sometimes for table generation on startup (which needs less time
than a page fault that may be necessary to bring in a hardcoded table).

For those projects that need a better CRC, why is the chosen solution is
to optimize it in the compiler instead of offering them a library they
could use with any compiler?

Was there any thought given to embedded projects that use bitwise CRC
exactly because they little space for a hardcoded table to spare?

Or those that use smaller tables -- for example, the classic VAX microcode 
approach with a 16-entry table, doing CRC 4 bits at a time.
Yup. I think we settled on 8 bits as a time for the table variant. It seemed like a good tradeoff between size of the tables and speed.


I agree that this seems an odd thing to optimize.  CRC is a well known CPU hog 
with well established efficient solutions, and it's hard to see  why anyone who 
needs good performance would fail to understand and apply that knowledge.
As I've said, what started us down this path was Coremark. But what convinced me that this was useful beyond juicing benchmark data was finding the various implementations in the wild.

Jeff

Reply via email to