On Sun, 30 Nov 2025 23:41:36 GMT, Piotr Tarsa <[email protected]> wrote:
>> Hi all, >> >> Please find my latest changes: I improved mixed insertion sort >> (and therefore the whole sorting), added description of sorting logic >> and created code template to reduce the duplication. Also I >> added a generation script. >> >> @DougLea Doug Lea >> @jatin-bhateja, @jbhateja Jatin Bhateja >> @tarsa Piotr Tarsa >> >> Please look at the latest sources. > > @VladimirIaroslavski: the organization (code partitioning) of the template > looks good overall. iiuc the template + generator together are more than > twice shorter than the original full java code, so the gains are definitely > there. one small extra suggestion (not well thought out so you can ignore > it): maybe it would be beneficial to apply such templating to test code too, > if that makes sense? > > i have one suggestion to test out, i.e. speeding up counting in case the > store-to-load forwarding (or memory disambiguation in general) in the cpu is > not good enough. the idea is described here: > https://fastcompression.blogspot.com/2014/09/counting-bytes-fast-little-trick-from.html > and in short it's about replacing one counter array with multiple counter > arrays that are modified one after another. the problem only occurs when > frequently modifying same counters (i.e. same memory locations) back-to-back, > e.g. if you're making histogram of data which has runs of the same value or > run of a very few different values (modern cpus perform multiple operations > at once, so they can try to count multiple values in parallel). in short: > it's all explained in the article. some extra theory as to what is the > mechanism in the cpu that's relevant to the counting optimisation is > https://en.wikipedia.org/wiki/Load-Hit-Store and > https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forward ing but it's probably enough to just test out the idea without going too deep into theory. using multiple counter arrays of course adds some constant overhead (for allocation at the beginning and merging counters at the end of counting), so it would be beneficial for big enough arrays only. @tarsa I think we can introduce template to test class as well. I will return very soon. I will look at idea to use multiple counter arrays, will see the results. Thank you for suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27411#issuecomment-3599009439
