Thanks, Gabe! I think the slow option is probably the right way to go for
now. It will match other things like our slow integer division
implementation ;). There's a lot in our x86 implementation that needs to be
improved if we want to match the actual latencies of instructions in modern
systems (https://www.agner.org/optimize/instruction_tables.pdf).

BTW, for PSHUFB, it's just one uop and one cycle latency according to those
tables (for modern AMD and Intel parts at least).

Cheers,
Jason

On Tue, Oct 13, 2020 at 5:45 PM Gabe Black <[email protected]> wrote:

> Yeah, it's a bit tricky. One of the tricky parts is that a single
> instruction, no matter how artificially complex you want to make it, can
> only do one load or one store to memory. There are some ways around that
> like subverting the CPU by using the threadcontext and doing, for instance,
> functional accesses, but that's probably prone to breakage and is *very*
> unrealistic.
>
> Another option could be to just write a really inefficient implementation
> in microcode which does all the register twiddling manually. That would be
> very slow and clunky compared to the real deal, but it would fit pretty
> easily into the existing system and would be a workable stop gap to just
> get by some problematic instructions you need for functional purposes. I
> would strongly suggest putting a warn_once microop in there to let people
> know not to expect realistic performance if you go that route.
>
> On Tue, Oct 13, 2020 at 8:27 AM Jason Lowe-Power <[email protected]>
> wrote:
>
>> Hi Gabe,
>>
>> Thanks for the info! This is a bit helpful. Although, I'm still not sure
>> what the next steps would be or how to even start on (1) or (2) that you
>> listed.
>>
>> Is it possible to focus on functional correctness first and then work on
>> the timing correctness? The problem is that modern applications assume
>> these SSE instructions exist. Specifically, we can't get Ubuntu to boot
>> with systemd enabled without SSE instructions (specifically PSHUFB).
>>
>> From your description here, it sounds like this could be weeks+ of
>> development effort for us. Do we think this is worth it? There are like 4
>> cases of this instruction executing in the workloads we've looked at so
>> far. Getting the timing *perfect* doesn't seem important.
>>
>> As you say, we need to overhaul the x86 SIMD instructions completely.
>> However, this is a months long project for someone very familiar with gem5.
>> It's infeasible to do that right now with our current resources.
>> Additionally, since I am not an expert on this code, we would really need
>> someone like you, Gabe, to mentor whoever is working on this project.
>>
>> Cheers,
>> Jason
>>
>> On Mon, Oct 12, 2020 at 10:34 PM Gabe Black via gem5-dev <
>> [email protected]> wrote:
>>
>>> Hi Hao. The shuffle microop is implemented in
>>> arch/x86/isa/microops/mediaop.isa. It looks like you'll need to do three
>>> things to implement PSHUFB.
>>>
>>> 1. Figure out a realistic way to get all three register operands into
>>> the instruction. The current version takes a destination register, and both
>>> halves of one of the source registers, and then finally takes the 8 bit
>>> immediate value in the ext field of the microop. With larger registers
>>> which won't fit into two 64 bit slices, that scheme won't work for the
>>> source operands. You'll need to figure out how to get all the information
>>> you need into a single instruction, or in other words all the data you'll
>>> need to generate one 64 bit chunk of the destination. It looks like instead
>>> of always passing the shuffle instructions in an 8 bit immediate, the
>>> PSHUFB takes yet another register to hold that value. You'll need to figure
>>> out how to get that in there too.
>>> 2. The shuffle microop seems to expect assume there are two flavors of
>>> behavior, size = 8, and otherwise. You may need to add more logic to handle
>>> additional situations this new instruction needs.
>>> 3. Actually implement the PSHUFB macroop using the revised version of
>>> shuffle.
>>>
>>> The two important requirements for this sort of modification are that
>>> the microops still behave realistically and are constrained. If they could
>>> just do whatever whacky, one off thing a particular instruction needed,
>>> then there wouldn't be much value in microops, we could (sort of) just do
>>> everything with regular instructions directly. That's not entirely true,
>>> but the idea is true. Also we need to make sure not to break any current
>>> instructions, so the existing uses of shuffle need to either work as is, or
>>> be updated to continue working with the revised version of shuffle.
>>>
>>> Note that unlike many of the other microops gem5 uses, I actually made
>>> up the ones that implement SSE. I went through all the SSE instructions
>>> that existed at that time and made up a set of microops which seemed
>>> realistic and could implement all the instructions. The instructions have
>>> changed over time (AVX didn't exist then for instance), and so we may need
>>> to update those microops to match the new instructions. Whatever we do
>>> though, we still need to make things realistic, not break any existing
>>> instructions, and not just hack in a magic microop that does what we need
>>> in this one case without considering the design as a whole.
>>>
>>> Gabe
>>>
>>> On Mon, Oct 12, 2020 at 10:08 PM Hoa Nguyen via gem5-dev <
>>> [email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've been trying to implement the PSHUFB instruction and I need some
>>>> help with this? While I found some documentation about this
>>>> instruction as well as I found a similar(?) instruction implemented in
>>>> gem5 (PSHUFD), I don't know how to implement PSHUFB in gem5.
>>>>
>>>> I saw that PSHUFD is broken down into 3 microps, two of which are
>>>> `shuffle` instructions. I don't really understand and not able to find
>>>> any documentation about this shuffle instruction. I wonder whether
>>>> PSHUFB could also be broken into shuffle instructions.
>>>>
>>>> Any help or suggestions would be appreciated!
>>>>
>>>> Regards,
>>>> Hoa Nguyen
>>>> _______________________________________________
>>>> gem5-dev mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>>>
>>> _______________________________________________
>>> gem5-dev mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to