Thanks for the answer. Looking forward to the new feature! *王 钲 荣*
Zhengrong Wang Computer Science Department University of California, Los Angeles California, USA 90024 Work Email: [email protected] Mobile :+1 310-447-4568 abarredo via gem5-dev <[email protected]> 于2020年6月2日周二 上午1:23写道: > Hi Zhengrong, > > I remember you! I have received a few emails regarding our simulator > since I presented the paper. My idea was to reply to all > of them once I had submitted some patches to the official repo. > > Regarding your questions: > > 1) It depends on the instruction's addressing mode. For example: > > # VADDPD > ## ZMM > def macroop VADDPD512_ZMM_ZMM_ZMM { > vaddfp vectorReg, vectorReg2, vectorRegm, size=8, vsize=512 > }; > > def macroop VADDPD512_ZMM_ZMM_M { > vldfpevex vectorAux1, seg, sib, "DISPLACEMENT", dataSize=8, > vsize=512, vdata=512 > vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 > }; > > def macroop VADDPD512_ZMM_ZMM_P { > rdip t7 > vldfpevex vectorAux1, seg, riprel, "DISPLACEMENT", dataSize=8, > vsize=512, vdata=512 > vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 > }; > > Those macro instructions represent three different addressing modes. The > first one does not require a memory access, so just one > micro-instruction is needed. > This is just a brief example, this instruction has many encodings, > depending on the operand size (128, 256 and 512 bits) and on the > masking. In my implementation, I consider > all this possibilities. > > 2) The register dependence is handled as in other ISA. The AVX512 ISA > contains 32 512-bit vector registers (if I remember well), so they are > accessed as specified the instruction's indices. > The register dependence is tracked in the rename stage from the O3 cpu > model. > > As we have mentioned to Jason, I'm currently working on my last > publication of the PhD. My idea is to finish it this summer and start > with the gem5 changes in September. > > Regards, > Adrián > > On 1/6/20 20:17, Zhengrong Wang via gem5-dev wrote: > > Hi Adrián, > > > > Yeah we have met at HPCA and I went to your presentation. I think your > > implementation is probably more complete and robust. If you plan to > > contribute it, it would be great! I do have a few questions: > > > > 1. How are the instructions broken into microops? e.g. a "vaddps" is > > decoded into a single microop? > > 2. If there is a single 512-bit register, how do you handle register > > dependence? Are there new register read/write APIs for these > instructions? > > > > Thanks! > > > > *王 钲 荣* > > > > Zhengrong Wang > > Computer Science Department > > University of California, Los Angeles > > California, USA > > 90024 > > > > Work Email: [email protected] > > Mobile :+1 310-447-4568 > > > > > > > > > > abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道: > > > >> Hi, > >> > >> I'm Adrián, I have extended the x86's ISA with the newest SIMD > >> extensions (AVX, AVX2 and AVX512). > >> As you know, the SSE implementation is inefficient (a 128-bit register > >> operations is modeled as 2 64-bit scalar operations). > >> If we plan to add support for AVX and AVX512 ISAs, the first thing to do > >> is to implement a proper vector register file for the x86 ISA. At the > >> moment I did it, > >> SVE was not released so I did it from scratch. Here, we could follow one > >> of these options: > >> > >> 1) Reuse the SVE implementation of the vector register file. > >> 2) Create a new one, as I did. > >> > >> Doing the first option means having compatibility with Arm's SVE > >> instructions, but also means re-implementing all my micro-instructions, > >> which would take long. > >> However, my current x86 SIMD micro-instruction's implementation is not > >> clean, so I'll have to spend time to do it more efficient any way. > >> > >> In my own vector register file, I perform a memory allocation of 512 bit > >> for all the registers. Then, every SSE, AVX and AVX512 instruction > operates > >> depending on the instruction's vector size. This implementation has been > >> tested on several applications (from the ParVec benchmark suite among > >> others) > >> and closely follows Intel's description in their official manual. This > >> simulator has been employed in our paper, published in HPCA2020. > >> > >> Adrián > >> > >> On 1/6/20 18:32, Miquel Moreto wrote: > >>> Hi Zhengrong and Jason, > >>> > >>> Let me CC Adrian Barredo, the PhD student that implemented AVX > >>> instructions in our gem5 simulation infrastructure. Since he did all > >>> the hard work, I believe it is better that he answers your questions. > :-) > >>> > >>> Best regards, > >>> > >>> --- Miquel > >>> > >>> On 1/6/20 17:14, Jason Lowe-Power wrote: > >>>> Hey Zhengrong, > >>>> > >>>> Thanks for getting started on this! I've also cc'd Miquel at BSC who > >>>> has implemented many of the x86 vector instructions. Miquel, it would > >>>> be great to get your input here! > >>>> > >>>> As far as getting input from AMD folks... I think this is going to be > >>>> a tough thing for them to weigh in on due to IP issues. This is > >>>> getting a bit too close to their products :). They can correct me if > >>>> I'm wrong! > >>>> > >>>> To answer your questions: > >>>> > >>>> - Design of the vector register file. My implementation directly > >>>> follows > >>>> the SSE instructions to minimize the work. Is there any better > >>>> way to do > >>>> this? > >>>> > >>>> > >>>> I agree with Gabe. This is the best approach for now. > >>>> > >>>> - If I am going to merge my code, what is a good submission plan? > >>>> I am > >>>> thinking about first committing the skeleton code with a simple > >>>> 'vaddps' > >>>> instruction, and then for other instructions. > >>>> > >>>> > >>>> That sounds good to me. If you think the whole set of changes should > >>>> be reviewed together or there's no way to split things apart and > >>>> still be understandable, we can create a feature branch for you. That > >>>> said, since this is mostly just adding one instruction and doesn't > >>>> touch too much outside of the ISA implementation, just breaking it up > >>>> that way will probably work. > >>>> > >>>> > >>>> - Testing: This is probably the most important one. Currently, I > >>>> manually > >>>> test my code by simulating small programs. What is the best way > >>>> to write > >>>> tests for new instructions? Should I try unit testing for binary > >>>> testing? > >>>> > >>>> > >>>> If you could submit your programs to the gem5-resources repo, we can > >>>> build the binaries and then distribute them for anyone to use for > >>>> testing. > >>>> I think that works well. > >>>> > >>>> Cheers, > >>>> Jason > >>>> > >>>> On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev > >>>> <[email protected] <mailto:[email protected]>> wrote: > >>>> > >>>> > >> > https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo > >>>> On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev < > >>>> [email protected] <mailto:[email protected]>> wrote: > >>>> > >>>> > Hi Gabe, > >>>> > > >>>> > Thanks for your reply. For the vector register file, I agree it > >>>> is probably > >>>> > a better idea to stick with current approach, at least it does > >>>> not require > >>>> > changing the SSE instructions. I cound not find your plan to > >>>> redesign the > >>>> > register handling mechanism. If you could provide a link I > would > >> be > >>>> > interested to take a look to have better understanding of the > >>>> philosophy > >>>> > behind the design. > >>>> > > >>>> > Let's hear from AMD first as they have more insights about the > >>>> microop. If > >>>> > everything turns out well, I can start to refactor the code > >>>> into smaller > >>>> > commits and add tests for that. > >>>> > > >>>> > *王 钲 荣* > >>>> > > >>>> > Zhengrong Wang > >>>> > Computer Science Department > >>>> > University of California, Los Angeles > >>>> > California, USA > >>>> > 90024 > >>>> > > >>>> > Work Email: [email protected] <mailto: > [email protected] > >>>> > Mobile :+1 310-447-4568 <(310)%20447-4568> > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > Gabe Black via gem5-dev <[email protected] > >>>> <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道: > >>>> > > >>>> > > Hi Sean. I'm not aware of anyone working on AVX-512, but it > >>>> would be nice > >>>> > > if the AMD folks could chime in and confirm that. The x86 > >>>> microcode was > >>>> > > originally based off of the microcode for the K6 as described > >>>> in a > >>>> > patent. > >>>> > > The floating point parts of that patent were very vague and > >>>> hand wavy, > >>>> > so I > >>>> > > more or less made up the initial part. It would be nice for > >>>> the AMD folks > >>>> > > to chime in here too, as far as what's realistic for the > >>>> design of the > >>>> > > microops. > >>>> > > > >>>> > > As far as testing, we don't have a great scheme for testing > >>>> individual > >>>> > > instructions right now, but that would be really valuable to > >>>> have in the > >>>> > > long run. I've thought a bit about how that might work, but I > >>>> don't have > >>>> > a > >>>> > > plan at the moment. The best thing to do right now is to > >>>> probably to have > >>>> > > small programs that execute the instructions in question and > >>>> print their > >>>> > > inputs/outputs and/or check that the outputs are correct. I > >>>> think our > >>>> > > testing framework has a way to check that program output > >>>> matches a golden > >>>> > > reference, and that could be used to delegate correctness > >>>> checking to the > >>>> > > framework. Bobby can probably give more details here. > >>>> > > > >>>> > > As far as the registers, my preference for now is to do what > >>>> you did and > >>>> > > treat each 64 bit chunk as its own register. There are real > >>>> drawbacks to > >>>> > > this approach, but the existing solution to them, a vector > >>>> register file, > >>>> > > has other, in my opinion more serious, drawbacks. A while ago > >>>> I put > >>>> > > together a manifesto about how I'd want to redo the whole > >>>> register > >>>> > handling > >>>> > > mechanism in gem5, but unfortunately I haven't had time to > >>>> actually > >>>> > > implement very much of it. By treating larger registers as > >>>> groups of > >>>> > > smaller registers, you'd be consistent with the rest of the > >>>> x86 code as > >>>> > it > >>>> > > stands right now. That, and the fact that I think that's the > >>>> lesser of > >>>> > two > >>>> > > evils, makes that my preferred way to go. > >>>> > > > >>>> > > As far as submitting code, there are instructions on the gem5 > >>>> website for > >>>> > > creating and submitting reviews. We use gerrit, and so in > >>>> addition to the > >>>> > > instructions we provide, you should be able to find pretty > >>>> good/complete > >>>> > > instructions out on the internet to explain the mechanism of > >>>> sending out > >>>> > a > >>>> > > review. For this or any other change, you'd want to break up > >>>> your work > >>>> > into > >>>> > > logical chunks where everything works before and after any > >>>> given change, > >>>> > > and then send them out (perhaps all together in a series) for > >>>> review. > >>>> > > Exactly how to break things up is up to you, but my opinion > >>>> is that each > >>>> > > change should be logically complete but also about one thing. > >>>> That makes > >>>> > it > >>>> > > easier for a reviewer to wrap their head around what you're > >>>> doing and how > >>>> > > it works without having to untangle multiple things going on > >>>> at once, or > >>>> > > having to merge multiple reviews together in their head to > >>>> see the whole > >>>> > > change their reviewing. If there are lots of related small > >>>> changes (many > >>>> > > individual instructions for instance) it might make sense to > >>>> do one or > >>>> > two > >>>> > > by themselves first, and then once the kinks are worked out > >>>> to do a > >>>> > larger > >>>> > > change with the rest, applying the pattern from the earlier > >>>> reviews. > >>>> > > > >>>> > > Gabe > >>>> > > > >>>> > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev < > >>>> > [email protected] <mailto:[email protected]>> > >>>> > > wrote: > >>>> > > > >>>> > > > Hello, > >>>> > > > > >>>> > > > This is my first time posting here, so apologies if I made > any > >>>> > mistakes. > >>>> > > > > >>>> > > > The last time I checked the develop branch, gem5 has not > >>>> yet supported > >>>> > > the > >>>> > > > AVX512. And searching the mail list I do not see any plan > >>>> for that. Is > >>>> > > > there any ongoing development to support that? If not, I am > >>>> happy to > >>>> > > > contribute my code. During my research, I have developed > >>>> partial > >>>> > support > >>>> > > > for AVX512 (and AVX-256 as a by-product), which I hope > >>>> would be useful > >>>> > > for > >>>> > > > others. > >>>> > > > > >>>> > > > My implementation so far is a straightforward extension to > >>>> the existing > >>>> > > SSE > >>>> > > > instructions. To summarize it: > >>>> > > > > >>>> > > > - Like SSE implementation, the 512-bit register is broken > >>>> into 8 64-bit > >>>> > > > sub-register. This may not be a good design. Any > >>>> suggestions are > >>>> > welcome. > >>>> > > > - Unlike SSE implementation, most of the instructions are > >>>> broken into a > >>>> > > > single microop. For example, a 512-bit 'vaddps' is decoded > >>>> into one > >>>> > > 'vaddf' > >>>> > > > microop instead of eight. > >>>> > > > - Currently, it supports common arithmetic instructions > >>>> (add, mul, > >>>> > etc.) > >>>> > > > and basic data movement (load, store, mov, extract, insert, > >>>> etc.). > >>>> > > > - No support for masking. > >>>> > > > > >>>> > > > If you guys are interested, I am willing to clean my code > >>>> and submit > >>>> > for > >>>> > > > review. I may need some guidance on: > >>>> > > > > >>>> > > > - Design of the vector register file. My implementation > >>>> directly > >>>> > follows > >>>> > > > the SSE instructions to minimize the work. Is there any > >>>> better way to > >>>> > do > >>>> > > > this? > >>>> > > > - If I am going to merge my code, what is a good submission > >>>> plan? I am > >>>> > > > thinking about first committing the skeleton code with a > >> simple > >>>> > 'vaddps' > >>>> > > > instruction, and then for other instructions. > >>>> > > > - Testing: This is probably the most important one. > >>>> Currently, I > >>>> > manually > >>>> > > > test my code by simulating small programs. What is the best > >>>> way to > >>>> > write > >>>> > > > tests for new instructions? Should I try unit testing for > >>>> binary > >>>> > testing? > >>>> > > > > >>>> > > > Thank you for reading this long post. Any feedback is > welcome. > >>>> > > > > >>>> > > > *王 钲 荣* > >>>> > > > > >>>> > > > Zhengrong Wang > >>>> > > > Computer Science Department > >>>> > > > University of California, Los Angeles > >>>> > > > California, USA > >>>> > > > 90024 > >>>> > > > > >>>> > > > Work Email: [email protected] > >>>> <mailto:[email protected]> > >>>> > > > Mobile :+1 310-447-4568 <(310)%20447-4568> > <(310)%20447-4568> > >>>> > > > _______________________________________________ > >>>> > > > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > > > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > > _______________________________________________ > >>>> > > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > _______________________________________________ > >>>> > gem5-dev mailing list -- [email protected] > >>>> <mailto:[email protected]> > >>>> > To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> _______________________________________________ > >>>> gem5-dev mailing list -- [email protected] <mailto: > >> [email protected]> > >>>> To unsubscribe send an email to [email protected] > >>>> <mailto:[email protected]> > >>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >>>> > >>> > >>> WARNING / LEGAL TEXT: This message is intended only for the use of the > >>> individual or entity to which it is addressed and may contain > >>> information which is privileged, confidential, proprietary, or exempt > >>> from disclosure under applicable law. If you are not the intended > >>> recipient or the person responsible for delivering the message to the > >>> intended recipient, you are strictly prohibited from disclosing, > >>> distributing, copying, or in any way using this message. If you have > >>> received this communication in error, please notify the sender and > >>> destroy and delete any copies you may have received. > >>> > >>> http://www.bsc.es/disclaimer > >> > >> http://bsc.es/disclaimer > >> _______________________________________________ > >> gem5-dev mailing list -- [email protected] > >> To unsubscribe send an email to [email protected] > >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > _______________________________________________ > > gem5-dev mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > http://bsc.es/disclaimer > _______________________________________________ > gem5-dev mailing list -- [email protected] > To unsubscribe send an email to [email protected] > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
