Thanks, Gabe! I think the slow option is probably the right way to go for now. It will match other things like our slow integer division implementation ;). There's a lot in our x86 implementation that needs to be improved if we want to match the actual latencies of instructions in modern systems (https://www.agner.org/optimize/instruction_tables.pdf).
BTW, for PSHUFB, it's just one uop and one cycle latency according to those tables (for modern AMD and Intel parts at least). Cheers, Jason On Tue, Oct 13, 2020 at 5:45 PM Gabe Black <[email protected]> wrote: > Yeah, it's a bit tricky. One of the tricky parts is that a single > instruction, no matter how artificially complex you want to make it, can > only do one load or one store to memory. There are some ways around that > like subverting the CPU by using the threadcontext and doing, for instance, > functional accesses, but that's probably prone to breakage and is *very* > unrealistic. > > Another option could be to just write a really inefficient implementation > in microcode which does all the register twiddling manually. That would be > very slow and clunky compared to the real deal, but it would fit pretty > easily into the existing system and would be a workable stop gap to just > get by some problematic instructions you need for functional purposes. I > would strongly suggest putting a warn_once microop in there to let people > know not to expect realistic performance if you go that route. > > On Tue, Oct 13, 2020 at 8:27 AM Jason Lowe-Power <[email protected]> > wrote: > >> Hi Gabe, >> >> Thanks for the info! This is a bit helpful. Although, I'm still not sure >> what the next steps would be or how to even start on (1) or (2) that you >> listed. >> >> Is it possible to focus on functional correctness first and then work on >> the timing correctness? The problem is that modern applications assume >> these SSE instructions exist. Specifically, we can't get Ubuntu to boot >> with systemd enabled without SSE instructions (specifically PSHUFB). >> >> From your description here, it sounds like this could be weeks+ of >> development effort for us. Do we think this is worth it? There are like 4 >> cases of this instruction executing in the workloads we've looked at so >> far. Getting the timing *perfect* doesn't seem important. >> >> As you say, we need to overhaul the x86 SIMD instructions completely. >> However, this is a months long project for someone very familiar with gem5. >> It's infeasible to do that right now with our current resources. >> Additionally, since I am not an expert on this code, we would really need >> someone like you, Gabe, to mentor whoever is working on this project. >> >> Cheers, >> Jason >> >> On Mon, Oct 12, 2020 at 10:34 PM Gabe Black via gem5-dev < >> [email protected]> wrote: >> >>> Hi Hao. The shuffle microop is implemented in >>> arch/x86/isa/microops/mediaop.isa. It looks like you'll need to do three >>> things to implement PSHUFB. >>> >>> 1. Figure out a realistic way to get all three register operands into >>> the instruction. The current version takes a destination register, and both >>> halves of one of the source registers, and then finally takes the 8 bit >>> immediate value in the ext field of the microop. With larger registers >>> which won't fit into two 64 bit slices, that scheme won't work for the >>> source operands. You'll need to figure out how to get all the information >>> you need into a single instruction, or in other words all the data you'll >>> need to generate one 64 bit chunk of the destination. It looks like instead >>> of always passing the shuffle instructions in an 8 bit immediate, the >>> PSHUFB takes yet another register to hold that value. You'll need to figure >>> out how to get that in there too. >>> 2. The shuffle microop seems to expect assume there are two flavors of >>> behavior, size = 8, and otherwise. You may need to add more logic to handle >>> additional situations this new instruction needs. >>> 3. Actually implement the PSHUFB macroop using the revised version of >>> shuffle. >>> >>> The two important requirements for this sort of modification are that >>> the microops still behave realistically and are constrained. If they could >>> just do whatever whacky, one off thing a particular instruction needed, >>> then there wouldn't be much value in microops, we could (sort of) just do >>> everything with regular instructions directly. That's not entirely true, >>> but the idea is true. Also we need to make sure not to break any current >>> instructions, so the existing uses of shuffle need to either work as is, or >>> be updated to continue working with the revised version of shuffle. >>> >>> Note that unlike many of the other microops gem5 uses, I actually made >>> up the ones that implement SSE. I went through all the SSE instructions >>> that existed at that time and made up a set of microops which seemed >>> realistic and could implement all the instructions. The instructions have >>> changed over time (AVX didn't exist then for instance), and so we may need >>> to update those microops to match the new instructions. Whatever we do >>> though, we still need to make things realistic, not break any existing >>> instructions, and not just hack in a magic microop that does what we need >>> in this one case without considering the design as a whole. >>> >>> Gabe >>> >>> On Mon, Oct 12, 2020 at 10:08 PM Hoa Nguyen via gem5-dev < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I've been trying to implement the PSHUFB instruction and I need some >>>> help with this? While I found some documentation about this >>>> instruction as well as I found a similar(?) instruction implemented in >>>> gem5 (PSHUFD), I don't know how to implement PSHUFB in gem5. >>>> >>>> I saw that PSHUFD is broken down into 3 microps, two of which are >>>> `shuffle` instructions. I don't really understand and not able to find >>>> any documentation about this shuffle instruction. I wonder whether >>>> PSHUFB could also be broken into shuffle instructions. >>>> >>>> Any help or suggestions would be appreciated! >>>> >>>> Regards, >>>> Hoa Nguyen >>>> _______________________________________________ >>>> gem5-dev mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s >>>> >>> _______________________________________________ >>> gem5-dev mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s >> >>
_______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
