[AMD Public Use] Microops are proprietary. However, the *number* of microops for a given architecture can be determined with a benchmark executing the instruction billions/trillions of times and comparing the ratio of microops to instructions using performance counters. People have done this already. Try searching for "Agner Fog Instruction Tables."
-Matt -----Original Message----- From: Zhengrong Wang via gem5-dev <[email protected]> Sent: Tuesday, June 2, 2020 2:53 PM To: gem5 Developer List <[email protected]> Cc: Zhengrong Wang <[email protected]> Subject: [gem5-dev] Re: Add AVX512 Support? [CAUTION: External Email] Thanks for the answer. Looking forward to the new feature! *王 钲 荣* Zhengrong Wang Computer Science Department University of California, Los Angeles California, USA 90024 Work Email: [email protected] Mobile :+1 310-447-4568 abarredo via gem5-dev <[email protected]> 于2020年6月2日周二 上午1:23写道: > Hi Zhengrong, > > I remember you! I have received a few emails regarding our simulator > since I presented the paper. My idea was to reply to all of them once > I had submitted some patches to the official repo. > > Regarding your questions: > > 1) It depends on the instruction's addressing mode. For example: > > # VADDPD > ## ZMM > def macroop VADDPD512_ZMM_ZMM_ZMM { > vaddfp vectorReg, vectorReg2, vectorRegm, size=8, vsize=512 }; > > def macroop VADDPD512_ZMM_ZMM_M { > vldfpevex vectorAux1, seg, sib, "DISPLACEMENT", dataSize=8, > vsize=512, vdata=512 > vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 }; > > def macroop VADDPD512_ZMM_ZMM_P { > rdip t7 > vldfpevex vectorAux1, seg, riprel, "DISPLACEMENT", dataSize=8, > vsize=512, vdata=512 > vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 }; > > Those macro instructions represent three different addressing modes. > The first one does not require a memory access, so just one > micro-instruction is needed. > This is just a brief example, this instruction has many encodings, > depending on the operand size (128, 256 and 512 bits) and on the > masking. In my implementation, I consider all this possibilities. > > 2) The register dependence is handled as in other ISA. The AVX512 ISA > contains 32 512-bit vector registers (if I remember well), so they are > accessed as specified the instruction's indices. > The register dependence is tracked in the rename stage from the O3 cpu > model. > > As we have mentioned to Jason, I'm currently working on my last > publication of the PhD. My idea is to finish it this summer and start > with the gem5 changes in September. > > Regards, > Adrián > > On 1/6/20 20:17, Zhengrong Wang via gem5-dev wrote: > > Hi Adrián, > > > > Yeah we have met at HPCA and I went to your presentation. I think > > your implementation is probably more complete and robust. If you > > plan to contribute it, it would be great! I do have a few questions: > > > > 1. How are the instructions broken into microops? e.g. a "vaddps" is > > decoded into a single microop? > > 2. If there is a single 512-bit register, how do you handle register > > dependence? Are there new register read/write APIs for these > instructions? > > > > Thanks! > > > > *王 钲 荣* > > > > Zhengrong Wang > > Computer Science Department > > University of California, Los Angeles California, USA > > 90024 > > > > Work Email: [email protected] > > Mobile :+1 310-447-4568 > > > > > > > > > > abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道: > > > >> Hi, > >> > >> I'm Adrián, I have extended the x86's ISA with the newest SIMD > >> extensions (AVX, AVX2 and AVX512). > >> As you know, the SSE implementation is inefficient (a 128-bit > >> register operations is modeled as 2 64-bit scalar operations). > >> If we plan to add support for AVX and AVX512 ISAs, the first thing > >> to do is to implement a proper vector register file for the x86 > >> ISA. At the moment I did it, SVE was not released so I did it from > >> scratch. Here, we could follow one of these options: > >> > >> 1) Reuse the SVE implementation of the vector register file. > >> 2) Create a new one, as I did. > >> > >> Doing the first option means having compatibility with Arm's SVE > >> instructions, but also means re-implementing all my > >> micro-instructions, which would take long. > >> However, my current x86 SIMD micro-instruction's implementation is > >> not clean, so I'll have to spend time to do it more efficient any way. > >> > >> In my own vector register file, I perform a memory allocation of > >> 512 bit for all the registers. Then, every SSE, AVX and AVX512 > >> instruction > operates > >> depending on the instruction's vector size. This implementation has > >> been tested on several applications (from the ParVec benchmark > >> suite among > >> others) > >> and closely follows Intel's description in their official manual. > >> This simulator has been employed in our paper, published in HPCA2020. > >> > >> Adrián > >> > >> On 1/6/20 18:32, Miquel Moreto wrote: > >>> Hi Zhengrong and Jason, > >>> > >>> Let me CC Adrian Barredo, the PhD student that implemented AVX > >>> instructions in our gem5 simulation infrastructure. Since he did > >>> all the hard work, I believe it is better that he answers your questions. > :-) > >>> > >>> Best regards, > >>> > >>> --- Miquel > >>> > >>> On 1/6/20 17:14, Jason Lowe-Power wrote: > >>>> Hey Zhengrong, > >>>> > >>>> Thanks for getting started on this! I've also cc'd Miquel at BSC > >>>> who has implemented many of the x86 vector instructions. Miquel, > >>>> it would be great to get your input here! > >>>> > >>>> As far as getting input from AMD folks... I think this is going > >>>> to be a tough thing for them to weigh in on due to IP issues. > >>>> This is getting a bit too close to their products :). They can > >>>> correct me if I'm wrong! > >>>> > >>>> To answer your questions: > >>>> > >>>> - Design of the vector register file. My implementation directly > >>>> follows > >>>> the SSE instructions to minimize the work. Is there any better > >>>> way to do > >>>> this? > >>>> > >>>> > >>>> I agree with Gabe. This is the best approach for now. > >>>> > >>>> - If I am going to merge my code, what is a good submission plan? > >>>> I am > >>>> thinking about first committing the skeleton code with a simple > >>>> 'vaddps' > >>>> instruction, and then for other instructions. > >>>> > >>>> > >>>> That sounds good to me. If you think the whole set of changes > >>>> should be reviewed together or there's no way to split things > >>>> apart and still be understandable, we can create a feature branch > >>>> for you. That said, since this is mostly just adding one > >>>> instruction and doesn't touch too much outside of the ISA > >>>> implementation, just breaking it up that way will probably work. > >>>> > >>>> > >>>> - Testing: This is probably the most important one. Currently, I > >>>> manually > >>>> test my code by simulating small programs. What is the best way > >>>> to write > >>>> tests for new instructions? Should I try unit testing for binary > >>>> testing? > >>>> > >>>> > >>>> If you could submit your programs to the gem5-resources repo, we > >>>> can build the binaries and then distribute them for anyone to use > >>>> for testing. > >>>> I think that works well. > >>>> > >>>> Cheers, > >>>> Jason > >>>> > >>>> On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev > >>>> <[email protected] <mailto:[email protected]>> wrote: > >>>> > >>>> > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs > .google.com%2Fdocument%2Fd%2F1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCe > HSk%2Fedit%23heading%3Dh.r067bn3rmydo&data=02%7C01%7Cmatthew.porem > ba%40amd.com%7C3148828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11 > a82d994e183d%7C0%7C0%7C637267315736408241&sdata=eRBU4V86RHVC3wASlK > ha69PJcODrFzCLCCWzjS%2F2nmc%3D&reserved=0 > >>>> On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev < > >>>> [email protected] <mailto:[email protected]>> wrote: > >>>> > >>>> > Hi Gabe, > >>>> > > >>>> > Thanks for your reply. For the vector register file, I agree it > >>>> is probably > >>>> > a better idea to stick with current approach, at least it does > >>>> not require > >>>> > changing the SSE instructions. I cound not find your plan to > >>>> redesign the > >>>> > register handling mechanism. If you could provide a link I > would > >> be > >>>> > interested to take a look to have better understanding of the > >>>> philosophy > >>>> > behind the design. > >>>> > > >>>> > Let's hear from AMD first as they have more insights about the > >>>> microop. If > >>>> > everything turns out well, I can start to refactor the code > >>>> into smaller > >>>> > commits and add tests for that. > >>>> > > >>>> > *王 钲 荣* > >>>> > > >>>> > Zhengrong Wang > >>>> > Computer Science Department > >>>> > University of California, Los Angeles > >>>> > California, USA > >>>> > 90024 > >>>> > > >>>> > Work Email: [email protected] <mailto: > [email protected] > >>>> > Mobile :+1 310-447-4568 <(310)%20447-4568> > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > Gabe Black via gem5-dev <[email protected] > >>>> <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道: > >>>> > > >>>> > > Hi Sean. I'm not aware of anyone working on AVX-512, but it > >>>> would be nice > >>>> > > if the AMD folks could chime in and confirm that. The x86 > >>>> microcode was > >>>> > > originally based off of the microcode for the K6 as described > >>>> in a > >>>> > patent. > >>>> > > The floating point parts of that patent were very vague and > >>>> hand wavy, > >>>> > so I > >>>> > > more or less made up the initial part. It would be nice for > >>>> the AMD folks > >>>> > > to chime in here too, as far as what's realistic for the > >>>> design of the > >>>> > > microops. > >>>> > > > >>>> > > As far as testing, we don't have a great scheme for testing > >>>> individual > >>>> > > instructions right now, but that would be really valuable to > >>>> have in the > >>>> > > long run. I've thought a bit about how that might work, but I > >>>> don't have > >>>> > a > >>>> > > plan at the moment. The best thing to do right now is to > >>>> probably to have > >>>> > > small programs that execute the instructions in question and > >>>> print their > >>>> > > inputs/outputs and/or check that the outputs are correct. I > >>>> think our > >>>> > > testing framework has a way to check that program output > >>>> matches a golden > >>>> > > reference, and that could be used to delegate correctness > >>>> checking to the > >>>> > > framework. Bobby can probably give more details here. > >>>> > > > >>>> > > As far as the registers, my preference for now is to do what > >>>> you did and > >>>> > > treat each 64 bit chunk as its own register. There are real > >>>> drawbacks to > >>>> > > this approach, but the existing solution to them, a vector > >>>> register file, > >>>> > > has other, in my opinion more serious, drawbacks. A while ago > >>>> I put > >>>> > > together a manifesto about how I'd want to redo the whole > >>>> register > >>>> > handling > >>>> > > mechanism in gem5, but unfortunately I haven't had time to > >>>> actually > >>>> > > implement very much of it. By treating larger registers as > >>>> groups of > >>>> > > smaller registers, you'd be consistent with the rest of the > >>>> x86 code as > >>>> > it > >>>> > > stands right now. That, and the fact that I think that's the > >>>> lesser of > >>>> > two > >>>> > > evils, makes that my preferred way to go. > >>>> > > > >>>> > > As far as submitting code, there are instructions on the gem5 > >>>> website for > >>>> > > creating and submitting reviews. We use gerrit, and so in > >>>> addition to the > >>>> > > instructions we provide, you should be able to find pretty > >>>> good/complete > >>>> > > instructions out on the internet to explain the mechanism of > >>>> sending out > >>>> > a > >>>> > > review. For this or any other change, you'd want to break up > >>>> your work > >>>> > into > >>>> > > logical chunks where everything works before and after any > >>>> given change, > >>>> > > and then send them out (perhaps all together in a series) for > >>>> review. > >>>> > > Exactly how to break things up is up to you, but my opinion > >>>> is that each > >>>> > > change should be logically complete but also about one thing. > >>>> That makes > >>>> > it > >>>> > > easier for a reviewer to wrap their head around what you're > >>>> doing and how > >>>> > > it works without having to untangle multiple things going on > >>>> at once, or > >>>> > > having to merge multiple reviews together in their head to > >>>> see the whole > >>>> > > change their reviewing. If there are lots of related small > >>>> changes (many > >>>> > > individual instructions for instance) it might make sense to > >>>> do one or > >>>> > two > >>>> > > by themselves first, and then once the kinks are worked out > >>>> to do a > >>>> > larger > >>>> > > change with the rest, applying the pattern from the earlier > >>>> reviews. > >>>> > > > >>>> > > Gabe > >>>> > > > >>>> > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev < > >>>> > [email protected] <mailto:[email protected]>> > >>>> > > wrote: > >>>> > > > >>>> > > > Hello, > >>>> > > > > >>>> > > > This is my first time posting here, so apologies if I > >>>> made > any > >>>> > mistakes. > >>>> > > > > >>>> > > > The last time I checked the develop branch, gem5 has not > >>>> yet supported > >>>> > > the > >>>> > > > AVX512. And searching the mail list I do not see any plan > >>>> for that. Is > >>>> > > > there any ongoing development to support that? If not, I am > >>>> happy to > >>>> > > > contribute my code. During my research, I have developed > >>>> partial > >>>> > support > >>>> > > > for AVX512 (and AVX-256 as a by-product), which I hope > >>>> would be useful > >>>> > > for > >>>> > > > others. > >>>> > > > > >>>> > > > My implementation so far is a straightforward extension to > >>>> the existing > >>>> > > SSE > >>>> > > > instructions. To summarize it: > >>>> > > > > >>>> > > > - Like SSE implementation, the 512-bit register is broken > >>>> into 8 64-bit > >>>> > > > sub-register. This may not be a good design. Any > >>>> suggestions are > >>>> > welcome. > >>>> > > > - Unlike SSE implementation, most of the instructions are > >>>> broken into a > >>>> > > > single microop. For example, a 512-bit 'vaddps' is decoded > >>>> into one > >>>> > > 'vaddf' > >>>> > > > microop instead of eight. > >>>> > > > - Currently, it supports common arithmetic instructions > >>>> (add, mul, > >>>> > etc.) > >>>> > > > and basic data movement (load, store, mov, extract, insert, > >>>> etc.). > >>>> > > > - No support for masking. > >>>> > > > > >>>> > > > If you guys are interested, I am willing to clean my code > >>>> and submit > >>>> > for > >>>> > > > review. I may need some guidance on: > >>>> > > > > >>>> > > > - Design of the vector register file. My implementation > >>>> directly > >>>> > follows > >>>> > > > the SSE instructions to minimize the work. Is there any > >>>> better way to > >>>> > do > >>>> > > > this? > >>>> > > > - If I am going to merge my code, what is a good submission > >>>> plan? I am > >>>> > > > thinking about first committing the skeleton code with > >>>> a > >> simple > >>>> > 'vaddps' > >>>> > > > instruction, and then for other instructions. > >>>> > > > - Testing: This is probably the most important one. > >>>> Currently, I > >>>> > manually > >>>> > > > test my code by simulating small programs. What is the best > >>>> way to > >>>> > write > >>>> > > > tests for new instructions? Should I try unit testing for > >>>> binary > >>>> > testing? > >>>> > > > > >>>> > > > Thank you for reading this long post. Any feedback is > welcome. > >>>> > > > > >>>> > > > *王 钲 荣* > >>>> > > > > >>>> > > > Zhengrong Wang > >>>> > > > Computer Science Department > >>>> > > > University of California, Los Angeles > >>>> > > > California, USA > >>>> > > > 90024 > >>>> > > > > >>>> > > > Work Email: [email protected] > >>>> <mailto:[email protected]> > >>>> > > > Mobile :+1 310-447-4568 <(310)%20447-4568> > <(310)%20447-4568> > >>>> > > > _______________________________________________ > >>>> > > > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > > > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > > _______________________________________________ > >>>> > > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > _______________________________________________ > >>>> > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> _______________________________________________ > >>>> gem5-dev mailing list -- [email protected] <mailto: > >> [email protected]> > >>>> To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > >>> > >>> WARNING / LEGAL TEXT: This message is intended only for the use of > >>> the individual or entity to which it is addressed and may contain > >>> information which is privileged, confidential, proprietary, or > >>> exempt from disclosure under applicable law. If you are not the > >>> intended recipient or the person responsible for delivering the > >>> message to the intended recipient, you are strictly prohibited > >>> from disclosing, distributing, copying, or in any way using this > >>> message. If you have received this communication in error, please > >>> notify the sender and destroy and delete any copies you may have received. > >>> > >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw > >>> ww.bsc.es%2Fdisclaimer&data=02%7C01%7Cmatthew.poremba%40amd.co > >>> m%7C3148828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994 > >>> e183d%7C0%7C0%7C637267315736408241&sdata=P9O4piQBA7BPlcYX5ldu9 > >>> gZGUnCiDuidr6kRPtdoRL0%3D&reserved=0 > >> > >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbs > >> c.es%2Fdisclaimer&data=02%7C01%7Cmatthew.poremba%40amd.com%7C31 > >> 48828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994e183d%7 > >> C0%7C0%7C637267315736408241&sdata=fEsiokRBqniLIwM%2FF3gNU%2BRq6 > >> rLETz7mVmwDh28XTHE%3D&reserved=0 > >> _______________________________________________ > >> gem5-dev mailing list -- [email protected] To unsubscribe send an > >> email to [email protected] > >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > _______________________________________________ > > gem5-dev mailing list -- [email protected] To unsubscribe send an > > email to [email protected] > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbsc.e > s%2Fdisclaimer&data=02%7C01%7Cmatthew.poremba%40amd.com%7C31488283 > 20b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C > 637267315736408241&sdata=fEsiokRBqniLIwM%2FF3gNU%2BRq6rLETz7mVmwDh > 28XTHE%3D&reserved=0 > _______________________________________________ > gem5-dev mailing list -- [email protected] To unsubscribe send an > email to [email protected] > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
