Hi Adrián, Yeah we have met at HPCA and I went to your presentation. I think your implementation is probably more complete and robust. If you plan to contribute it, it would be great! I do have a few questions:
1. How are the instructions broken into microops? e.g. a "vaddps" is decoded into a single microop? 2. If there is a single 512-bit register, how do you handle register dependence? Are there new register read/write APIs for these instructions? Thanks! *王 钲 荣* Zhengrong Wang Computer Science Department University of California, Los Angeles California, USA 90024 Work Email: [email protected] Mobile :+1 310-447-4568 abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道: > Hi, > > I'm Adrián, I have extended the x86's ISA with the newest SIMD > extensions (AVX, AVX2 and AVX512). > As you know, the SSE implementation is inefficient (a 128-bit register > operations is modeled as 2 64-bit scalar operations). > If we plan to add support for AVX and AVX512 ISAs, the first thing to do > is to implement a proper vector register file for the x86 ISA. At the > moment I did it, > SVE was not released so I did it from scratch. Here, we could follow one > of these options: > > 1) Reuse the SVE implementation of the vector register file. > 2) Create a new one, as I did. > > Doing the first option means having compatibility with Arm's SVE > instructions, but also means re-implementing all my micro-instructions, > which would take long. > However, my current x86 SIMD micro-instruction's implementation is not > clean, so I'll have to spend time to do it more efficient any way. > > In my own vector register file, I perform a memory allocation of 512 bit > for all the registers. Then, every SSE, AVX and AVX512 instruction operates > depending on the instruction's vector size. This implementation has been > tested on several applications (from the ParVec benchmark suite among > others) > and closely follows Intel's description in their official manual. This > simulator has been employed in our paper, published in HPCA2020. > > Adrián > > On 1/6/20 18:32, Miquel Moreto wrote: > > > > Hi Zhengrong and Jason, > > > > Let me CC Adrian Barredo, the PhD student that implemented AVX > > instructions in our gem5 simulation infrastructure. Since he did all > > the hard work, I believe it is better that he answers your questions. :-) > > > > Best regards, > > > > --- Miquel > > > > On 1/6/20 17:14, Jason Lowe-Power wrote: > >> Hey Zhengrong, > >> > >> Thanks for getting started on this! I've also cc'd Miquel at BSC who > >> has implemented many of the x86 vector instructions. Miquel, it would > >> be great to get your input here! > >> > >> As far as getting input from AMD folks... I think this is going to be > >> a tough thing for them to weigh in on due to IP issues. This is > >> getting a bit too close to their products :). They can correct me if > >> I'm wrong! > >> > >> To answer your questions: > >> > >> - Design of the vector register file. My implementation directly > >> follows > >> the SSE instructions to minimize the work. Is there any better > >> way to do > >> this? > >> > >> > >> I agree with Gabe. This is the best approach for now. > >> > >> - If I am going to merge my code, what is a good submission plan? > >> I am > >> thinking about first committing the skeleton code with a simple > >> 'vaddps' > >> instruction, and then for other instructions. > >> > >> > >> That sounds good to me. If you think the whole set of changes should > >> be reviewed together or there's no way to split things apart and > >> still be understandable, we can create a feature branch for you. That > >> said, since this is mostly just adding one instruction and doesn't > >> touch too much outside of the ISA implementation, just breaking it up > >> that way will probably work. > >> > >> > >> - Testing: This is probably the most important one. Currently, I > >> manually > >> test my code by simulating small programs. What is the best way > >> to write > >> tests for new instructions? Should I try unit testing for binary > >> testing? > >> > >> > >> If you could submit your programs to the gem5-resources repo, we can > >> build the binaries and then distribute them for anyone to use for > >> testing. > >> I think that works well. > >> > >> Cheers, > >> Jason > >> > >> On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> > https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo > >> > >> On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev < > >> [email protected] <mailto:[email protected]>> wrote: > >> > >> > Hi Gabe, > >> > > >> > Thanks for your reply. For the vector register file, I agree it > >> is probably > >> > a better idea to stick with current approach, at least it does > >> not require > >> > changing the SSE instructions. I cound not find your plan to > >> redesign the > >> > register handling mechanism. If you could provide a link I would > be > >> > interested to take a look to have better understanding of the > >> philosophy > >> > behind the design. > >> > > >> > Let's hear from AMD first as they have more insights about the > >> microop. If > >> > everything turns out well, I can start to refactor the code > >> into smaller > >> > commits and add tests for that. > >> > > >> > *王 钲 荣* > >> > > >> > Zhengrong Wang > >> > Computer Science Department > >> > University of California, Los Angeles > >> > California, USA > >> > 90024 > >> > > >> > Work Email: [email protected] <mailto:[email protected] > > > >> > Mobile :+1 310-447-4568 <(310)%20447-4568> > >> > > >> > > >> > > >> > > >> > Gabe Black via gem5-dev <[email protected] > >> <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道: > >> > > >> > > Hi Sean. I'm not aware of anyone working on AVX-512, but it > >> would be nice > >> > > if the AMD folks could chime in and confirm that. The x86 > >> microcode was > >> > > originally based off of the microcode for the K6 as described > >> in a > >> > patent. > >> > > The floating point parts of that patent were very vague and > >> hand wavy, > >> > so I > >> > > more or less made up the initial part. It would be nice for > >> the AMD folks > >> > > to chime in here too, as far as what's realistic for the > >> design of the > >> > > microops. > >> > > > >> > > As far as testing, we don't have a great scheme for testing > >> individual > >> > > instructions right now, but that would be really valuable to > >> have in the > >> > > long run. I've thought a bit about how that might work, but I > >> don't have > >> > a > >> > > plan at the moment. The best thing to do right now is to > >> probably to have > >> > > small programs that execute the instructions in question and > >> print their > >> > > inputs/outputs and/or check that the outputs are correct. I > >> think our > >> > > testing framework has a way to check that program output > >> matches a golden > >> > > reference, and that could be used to delegate correctness > >> checking to the > >> > > framework. Bobby can probably give more details here. > >> > > > >> > > As far as the registers, my preference for now is to do what > >> you did and > >> > > treat each 64 bit chunk as its own register. There are real > >> drawbacks to > >> > > this approach, but the existing solution to them, a vector > >> register file, > >> > > has other, in my opinion more serious, drawbacks. A while ago > >> I put > >> > > together a manifesto about how I'd want to redo the whole > >> register > >> > handling > >> > > mechanism in gem5, but unfortunately I haven't had time to > >> actually > >> > > implement very much of it. By treating larger registers as > >> groups of > >> > > smaller registers, you'd be consistent with the rest of the > >> x86 code as > >> > it > >> > > stands right now. That, and the fact that I think that's the > >> lesser of > >> > two > >> > > evils, makes that my preferred way to go. > >> > > > >> > > As far as submitting code, there are instructions on the gem5 > >> website for > >> > > creating and submitting reviews. We use gerrit, and so in > >> addition to the > >> > > instructions we provide, you should be able to find pretty > >> good/complete > >> > > instructions out on the internet to explain the mechanism of > >> sending out > >> > a > >> > > review. For this or any other change, you'd want to break up > >> your work > >> > into > >> > > logical chunks where everything works before and after any > >> given change, > >> > > and then send them out (perhaps all together in a series) for > >> review. > >> > > Exactly how to break things up is up to you, but my opinion > >> is that each > >> > > change should be logically complete but also about one thing. > >> That makes > >> > it > >> > > easier for a reviewer to wrap their head around what you're > >> doing and how > >> > > it works without having to untangle multiple things going on > >> at once, or > >> > > having to merge multiple reviews together in their head to > >> see the whole > >> > > change their reviewing. If there are lots of related small > >> changes (many > >> > > individual instructions for instance) it might make sense to > >> do one or > >> > two > >> > > by themselves first, and then once the kinks are worked out > >> to do a > >> > larger > >> > > change with the rest, applying the pattern from the earlier > >> reviews. > >> > > > >> > > Gabe > >> > > > >> > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev < > >> > [email protected] <mailto:[email protected]>> > >> > > wrote: > >> > > > >> > > > Hello, > >> > > > > >> > > > This is my first time posting here, so apologies if I made any > >> > mistakes. > >> > > > > >> > > > The last time I checked the develop branch, gem5 has not > >> yet supported > >> > > the > >> > > > AVX512. And searching the mail list I do not see any plan > >> for that. Is > >> > > > there any ongoing development to support that? If not, I am > >> happy to > >> > > > contribute my code. During my research, I have developed > >> partial > >> > support > >> > > > for AVX512 (and AVX-256 as a by-product), which I hope > >> would be useful > >> > > for > >> > > > others. > >> > > > > >> > > > My implementation so far is a straightforward extension to > >> the existing > >> > > SSE > >> > > > instructions. To summarize it: > >> > > > > >> > > > - Like SSE implementation, the 512-bit register is broken > >> into 8 64-bit > >> > > > sub-register. This may not be a good design. Any > >> suggestions are > >> > welcome. > >> > > > - Unlike SSE implementation, most of the instructions are > >> broken into a > >> > > > single microop. For example, a 512-bit 'vaddps' is decoded > >> into one > >> > > 'vaddf' > >> > > > microop instead of eight. > >> > > > - Currently, it supports common arithmetic instructions > >> (add, mul, > >> > etc.) > >> > > > and basic data movement (load, store, mov, extract, insert, > >> etc.). > >> > > > - No support for masking. > >> > > > > >> > > > If you guys are interested, I am willing to clean my code > >> and submit > >> > for > >> > > > review. I may need some guidance on: > >> > > > > >> > > > - Design of the vector register file. My implementation > >> directly > >> > follows > >> > > > the SSE instructions to minimize the work. Is there any > >> better way to > >> > do > >> > > > this? > >> > > > - If I am going to merge my code, what is a good submission > >> plan? I am > >> > > > thinking about first committing the skeleton code with a > simple > >> > 'vaddps' > >> > > > instruction, and then for other instructions. > >> > > > - Testing: This is probably the most important one. > >> Currently, I > >> > manually > >> > > > test my code by simulating small programs. What is the best > >> way to > >> > write > >> > > > tests for new instructions? Should I try unit testing for > >> binary > >> > testing? > >> > > > > >> > > > Thank you for reading this long post. Any feedback is welcome. > >> > > > > >> > > > *王 钲 荣* > >> > > > > >> > > > Zhengrong Wang > >> > > > Computer Science Department > >> > > > University of California, Los Angeles > >> > > > California, USA > >> > > > 90024 > >> > > > > >> > > > Work Email: [email protected] > >> <mailto:[email protected]> > >> > > > Mobile :+1 310-447-4568 <(310)%20447-4568> <(310)%20447-4568> > >> > > > _______________________________________________ > >> > > > gem5-dev mailing list -- [email protected] > >> <mailto:[email protected]> > >> > > > To unsubscribe send an email to [email protected] > >> <mailto:[email protected]> > >> > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >> > > _______________________________________________ > >> > > gem5-dev mailing list -- [email protected] > >> <mailto:[email protected]> > >> > > To unsubscribe send an email to [email protected] > >> <mailto:[email protected]> > >> > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >> > _______________________________________________ > >> > gem5-dev mailing list -- [email protected] > >> <mailto:[email protected]> > >> > To unsubscribe send an email to [email protected] > >> <mailto:[email protected]> > >> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >> _______________________________________________ > >> gem5-dev mailing list -- [email protected] <mailto: > [email protected]> > >> To unsubscribe send an email to [email protected] > >> <mailto:[email protected]> > >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > >> > > > > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > > individual or entity to which it is addressed and may contain > > information which is privileged, confidential, proprietary, or exempt > > from disclosure under applicable law. If you are not the intended > > recipient or the person responsible for delivering the message to the > > intended recipient, you are strictly prohibited from disclosing, > > distributing, copying, or in any way using this message. If you have > > received this communication in error, please notify the sender and > > destroy and delete any copies you may have received. > > > > http://www.bsc.es/disclaimer > > > http://bsc.es/disclaimer > _______________________________________________ > gem5-dev mailing list -- [email protected] > To unsubscribe send an email to [email protected] > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-dev mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
