[gem5-dev] Re: Add AVX512 Support?

Poremba, Matthew via gem5-dev Mon, 08 Jun 2020 14:40:20 -0700

[AMD Public Use]

Microops are proprietary. However, the *number* of microops for a given 
architecture can be determined with a benchmark executing the instruction 
billions/trillions of times and comparing the ratio of microops to instructions 
using performance counters. People have done this already. Try searching for 
"Agner Fog Instruction Tables."



-Matt

-----Original Message-----
From: Zhengrong Wang via gem5-dev <[email protected]> 
Sent: Tuesday, June 2, 2020 2:53 PM
To: gem5 Developer List <[email protected]>
Cc: Zhengrong Wang <[email protected]>
Subject: [gem5-dev] Re: Add AVX512 Support?

[CAUTION: External Email]

Thanks for the answer. Looking forward to the new feature!

*王 钲 荣*

Zhengrong Wang
Computer Science Department
University of California, Los Angeles
California, USA
90024

Work Email: [email protected]
Mobile :+1 310-447-4568




abarredo via gem5-dev <[email protected]> 于2020年6月2日周二 上午1:23写道：

> Hi Zhengrong,
>
> I remember you! I have received a few emails regarding our simulator 
> since I presented the paper. My idea was to reply to all of them once 
> I had submitted some patches to the official repo.
>
> Regarding your questions:
>
> 1) It depends on the instruction's addressing mode.  For example:
>
> # VADDPD
> ## ZMM
> def macroop VADDPD512_ZMM_ZMM_ZMM {
>      vaddfp vectorReg, vectorReg2, vectorRegm, size=8, vsize=512 };
>
> def macroop VADDPD512_ZMM_ZMM_M {
>      vldfpevex vectorAux1, seg, sib, "DISPLACEMENT", dataSize=8, 
> vsize=512, vdata=512
>      vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 };
>
> def macroop VADDPD512_ZMM_ZMM_P {
>      rdip t7
>      vldfpevex vectorAux1, seg, riprel, "DISPLACEMENT", dataSize=8, 
> vsize=512, vdata=512
>      vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512 };
>
> Those macro instructions represent three different addressing modes. 
> The first one does not require a memory access, so just one 
> micro-instruction is needed.
> This is just a brief example, this instruction has many encodings, 
> depending on the operand size (128, 256 and 512 bits) and on the 
> masking. In my implementation, I consider all this possibilities.
>
> 2) The register dependence is handled as in other ISA. The AVX512 ISA 
> contains 32 512-bit vector registers (if I remember well), so they are 
> accessed as specified the instruction's indices.
> The register dependence is tracked in the rename stage from the O3 cpu 
> model.
>
> As we have mentioned to Jason, I'm currently working on my last 
> publication of the PhD. My idea is to finish it this summer and start 
> with the gem5 changes in September.
>
> Regards,
> Adrián
>
> On 1/6/20 20:17, Zhengrong Wang via gem5-dev wrote:
> > Hi Adrián,
> >
> > Yeah we have met at HPCA and I went to your presentation. I think 
> > your implementation is probably more complete and robust. If you 
> > plan to contribute it, it would be great! I do have a few questions:
> >
> > 1. How are the instructions broken into microops? e.g. a "vaddps" is 
> > decoded into a single microop?
> > 2. If there is a single 512-bit register, how do you handle register 
> > dependence? Are there new register read/write APIs for these
> instructions?
> >
> > Thanks!
> >
> > *王 钲 荣*
> >
> > Zhengrong Wang
> > Computer Science Department
> > University of California, Los Angeles California, USA
> > 90024
> >
> > Work Email: [email protected]
> > Mobile :+1 310-447-4568
> >
> >
> >
> >
> > abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道：
> >
> >> Hi,
> >>
> >> I'm Adrián, I have extended the x86's ISA with the newest SIMD 
> >> extensions (AVX, AVX2 and AVX512).
> >> As you know, the SSE implementation is inefficient (a 128-bit 
> >> register operations is modeled as 2 64-bit scalar operations).
> >> If we plan to add support for AVX and AVX512 ISAs, the first thing 
> >> to do is to implement a proper vector register file for the x86 
> >> ISA. At the moment I did it, SVE was not released so I did it from 
> >> scratch. Here, we could follow one of these options:
> >>
> >> 1) Reuse the SVE implementation of the vector register file.
> >> 2) Create a new one, as I did.
> >>
> >> Doing the first option means having compatibility with Arm's SVE 
> >> instructions, but also means re-implementing all my 
> >> micro-instructions, which would take long.
> >> However, my current x86 SIMD micro-instruction's implementation is 
> >> not clean, so I'll have to spend time to do it more efficient any way.
> >>
> >> In my own vector register file, I perform a memory allocation of 
> >> 512 bit for all the registers. Then, every SSE, AVX and AVX512 
> >> instruction
> operates
> >> depending on the instruction's vector size. This implementation has 
> >> been tested on several applications (from the ParVec benchmark 
> >> suite among
> >> others)
> >> and closely follows Intel's description in their official manual. 
> >> This simulator has been employed in our paper, published in HPCA2020.
> >>
> >> Adrián
> >>
> >> On 1/6/20 18:32, Miquel Moreto wrote:
> >>> Hi Zhengrong and Jason,
> >>>
> >>> Let me CC Adrian Barredo, the PhD student that implemented AVX 
> >>> instructions in our gem5 simulation infrastructure. Since he did 
> >>> all the hard work, I believe it is better that he answers your questions.
> :-)
> >>>
> >>> Best regards,
> >>>
> >>> --- Miquel
> >>>
> >>> On 1/6/20 17:14, Jason Lowe-Power wrote:
> >>>> Hey Zhengrong,
> >>>>
> >>>> Thanks for getting started on this! I've also cc'd Miquel at BSC 
> >>>> who has implemented many of the x86 vector instructions. Miquel, 
> >>>> it would be great to get your input here!
> >>>>
> >>>> As far as getting input from AMD folks... I think this is going 
> >>>> to be a tough thing for them to weigh in on due to IP issues. 
> >>>> This is getting a bit too close to their products :). They can 
> >>>> correct me if I'm wrong!
> >>>>
> >>>> To answer your questions:
> >>>>
> >>>>      - Design of the vector register file. My implementation directly
> >>>>      follows
> >>>>      the SSE instructions to minimize the work. Is there any better
> >>>>      way to do
> >>>>      this?
> >>>>
> >>>>
> >>>> I agree with Gabe. This is the best approach for now.
> >>>>
> >>>>      - If I am going to merge my code, what is a good submission plan?
> >>>>      I am
> >>>>      thinking about first committing the skeleton code with a simple
> >>>>      'vaddps'
> >>>>      instruction, and then for other instructions.
> >>>>
> >>>>
> >>>> That sounds good to me. If you think the whole set of changes 
> >>>> should be reviewed together or there's no way to split things 
> >>>> apart and still be understandable, we can create a feature branch 
> >>>> for you. That said, since this is mostly just adding one 
> >>>> instruction and doesn't touch too much outside of the ISA 
> >>>> implementation, just breaking it up that way will probably work.
> >>>>
> >>>>
> >>>>      - Testing: This is probably the most important one. Currently, I
> >>>>      manually
> >>>>      test my code by simulating small programs. What is the best way
> >>>>      to write
> >>>>      tests for new instructions? Should I try unit testing for binary
> >>>>      testing?
> >>>>
> >>>>
> >>>> If you could submit your programs to the gem5-resources repo, we 
> >>>> can build the binaries and then distribute them for anyone to use 
> >>>> for testing.
> >>>> I think that works well.
> >>>>
> >>>> Cheers,
> >>>> Jason
> >>>>
> >>>> On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev 
> >>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>
> >>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> .google.com%2Fdocument%2Fd%2F1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCe
> HSk%2Fedit%23heading%3Dh.r067bn3rmydo&amp;data=02%7C01%7Cmatthew.porem
> ba%40amd.com%7C3148828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11
> a82d994e183d%7C0%7C0%7C637267315736408241&amp;sdata=eRBU4V86RHVC3wASlK
> ha69PJcODrFzCLCCWzjS%2F2nmc%3D&amp;reserved=0
> >>>>      On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev <
> >>>>      [email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>      > Hi Gabe,
> >>>>      >
> >>>>      > Thanks for your reply. For the vector register file, I agree it
> >>>>      is probably
> >>>>      > a better idea to stick with current approach, at least it does
> >>>>      not require
> >>>>      > changing the SSE instructions. I cound not find your plan to
> >>>>      redesign the
> >>>>      > register handling mechanism. If you could provide a link I
> would
> >> be
> >>>>      > interested to take a look to have better understanding of the
> >>>>      philosophy
> >>>>      > behind the design.
> >>>>      >
> >>>>      > Let's hear from AMD first as they have more insights about the
> >>>>      microop. If
> >>>>      > everything turns out well, I can start to refactor the code
> >>>>      into smaller
> >>>>      > commits and add tests for that.
> >>>>      >
> >>>>      > *王 钲 荣*
> >>>>      >
> >>>>      > Zhengrong Wang
> >>>>      > Computer Science Department
> >>>>      > University of California, Los Angeles
> >>>>      > California, USA
> >>>>      > 90024
> >>>>      >
> >>>>      > Work Email: [email protected] <mailto:
> [email protected]
> >>>>      > Mobile :+1 310-447-4568 <(310)%20447-4568>
> >>>>      >
> >>>>      >
> >>>>      >
> >>>>      >
> >>>>      > Gabe Black via gem5-dev <[email protected]
> >>>>      <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道：
> >>>>      >
> >>>>      > > Hi Sean. I'm not aware of anyone working on AVX-512, but it
> >>>>      would be nice
> >>>>      > > if the AMD folks could chime in and confirm that. The x86
> >>>>      microcode was
> >>>>      > > originally based off of the microcode for the K6 as described
> >>>>      in a
> >>>>      > patent.
> >>>>      > > The floating point parts of that patent were very vague and
> >>>>      hand wavy,
> >>>>      > so I
> >>>>      > > more or less made up the initial part. It would be nice for
> >>>>      the AMD folks
> >>>>      > > to chime in here too, as far as what's realistic for the
> >>>>      design of the
> >>>>      > > microops.
> >>>>      > >
> >>>>      > > As far as testing, we don't have a great scheme for testing
> >>>>      individual
> >>>>      > > instructions right now, but that would be really valuable to
> >>>>      have in the
> >>>>      > > long run. I've thought a bit about how that might work, but I
> >>>>      don't have
> >>>>      > a
> >>>>      > > plan at the moment. The best thing to do right now is to
> >>>>      probably to have
> >>>>      > > small programs that execute the instructions in question and
> >>>>      print their
> >>>>      > > inputs/outputs and/or check that the outputs are correct. I
> >>>>      think our
> >>>>      > > testing framework has a way to check that program output
> >>>>      matches a golden
> >>>>      > > reference, and that could be used to delegate correctness
> >>>>      checking to the
> >>>>      > > framework. Bobby can probably give more details here.
> >>>>      > >
> >>>>      > > As far as the registers, my preference for now is to do what
> >>>>      you did and
> >>>>      > > treat each 64 bit chunk as its own register. There are real
> >>>>      drawbacks to
> >>>>      > > this approach, but the existing solution to them, a vector
> >>>>      register file,
> >>>>      > > has other, in my opinion more serious, drawbacks. A while ago
> >>>>      I put
> >>>>      > > together a manifesto about how I'd want to redo the whole
> >>>>      register
> >>>>      > handling
> >>>>      > > mechanism in gem5, but unfortunately I haven't had time to
> >>>>      actually
> >>>>      > > implement very much of it. By treating larger registers as
> >>>>      groups of
> >>>>      > > smaller registers, you'd be consistent with the rest of the
> >>>>      x86 code as
> >>>>      > it
> >>>>      > > stands right now. That, and the fact that I think that's the
> >>>>      lesser of
> >>>>      > two
> >>>>      > > evils, makes that my preferred way to go.
> >>>>      > >
> >>>>      > > As far as submitting code, there are instructions on the gem5
> >>>>      website for
> >>>>      > > creating and submitting reviews. We use gerrit, and so in
> >>>>      addition to the
> >>>>      > > instructions we provide, you should be able to find pretty
> >>>>      good/complete
> >>>>      > > instructions out on the internet to explain the mechanism of
> >>>>      sending out
> >>>>      > a
> >>>>      > > review. For this or any other change, you'd want to break up
> >>>>      your work
> >>>>      > into
> >>>>      > > logical chunks where everything works before and after any
> >>>>      given change,
> >>>>      > > and then send them out (perhaps all together in a series) for
> >>>>      review.
> >>>>      > > Exactly how to break things up is up to you, but my opinion
> >>>>      is that each
> >>>>      > > change should be logically complete but also about one thing.
> >>>>      That makes
> >>>>      > it
> >>>>      > > easier for a reviewer to wrap their head around what you're
> >>>>      doing and how
> >>>>      > > it works without having to untangle multiple things going on
> >>>>      at once, or
> >>>>      > > having to merge multiple reviews together in their head to
> >>>>      see the whole
> >>>>      > > change their reviewing. If there are lots of related small
> >>>>      changes (many
> >>>>      > > individual instructions for instance) it might make sense to
> >>>>      do one or
> >>>>      > two
> >>>>      > > by themselves first, and then once the kinks are worked out
> >>>>      to do a
> >>>>      > larger
> >>>>      > > change with the rest, applying the pattern from the earlier
> >>>>      reviews.
> >>>>      > >
> >>>>      > > Gabe
> >>>>      > >
> >>>>      > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <
> >>>>      > [email protected] <mailto:[email protected]>>
> >>>>      > > wrote:
> >>>>      > >
> >>>>      > > > Hello,
> >>>>      > > >
> >>>>      > > > This is my first time posting here, so apologies if I 
> >>>> made
> any
> >>>>      > mistakes.
> >>>>      > > >
> >>>>      > > > The last time I checked the develop branch, gem5 has not
> >>>>      yet supported
> >>>>      > > the
> >>>>      > > > AVX512. And searching the mail list I do not see any plan
> >>>>      for that. Is
> >>>>      > > > there any ongoing development to support that? If not, I am
> >>>>      happy to
> >>>>      > > > contribute my code. During my research, I have developed
> >>>>      partial
> >>>>      > support
> >>>>      > > > for AVX512 (and AVX-256 as a by-product), which I hope
> >>>>      would be useful
> >>>>      > > for
> >>>>      > > > others.
> >>>>      > > >
> >>>>      > > > My implementation so far is a straightforward extension to
> >>>>      the existing
> >>>>      > > SSE
> >>>>      > > > instructions. To summarize it:
> >>>>      > > >
> >>>>      > > > - Like SSE implementation, the 512-bit register is broken
> >>>>      into 8 64-bit
> >>>>      > > > sub-register. This may not be a good design. Any
> >>>>      suggestions are
> >>>>      > welcome.
> >>>>      > > > - Unlike SSE implementation, most of the instructions are
> >>>>      broken into a
> >>>>      > > > single microop. For example, a 512-bit 'vaddps' is decoded
> >>>>      into one
> >>>>      > > 'vaddf'
> >>>>      > > > microop instead of eight.
> >>>>      > > > - Currently, it supports common arithmetic instructions
> >>>>      (add, mul,
> >>>>      > etc.)
> >>>>      > > > and basic data movement (load, store, mov, extract, insert,
> >>>>      etc.).
> >>>>      > > > - No support for masking.
> >>>>      > > >
> >>>>      > > > If you guys are interested, I am willing to clean my code
> >>>>      and submit
> >>>>      > for
> >>>>      > > > review. I may need some guidance on:
> >>>>      > > >
> >>>>      > > > - Design of the vector register file. My implementation
> >>>>      directly
> >>>>      > follows
> >>>>      > > > the SSE instructions to minimize the work. Is there any
> >>>>      better way to
> >>>>      > do
> >>>>      > > > this?
> >>>>      > > > - If I am going to merge my code, what is a good submission
> >>>>      plan? I am
> >>>>      > > > thinking about first committing the skeleton code with 
> >>>> a
> >> simple
> >>>>      > 'vaddps'
> >>>>      > > > instruction, and then for other instructions.
> >>>>      > > > - Testing: This is probably the most important one.
> >>>>      Currently, I
> >>>>      > manually
> >>>>      > > > test my code by simulating small programs. What is the best
> >>>>      way to
> >>>>      > write
> >>>>      > > > tests for new instructions? Should I try unit testing for
> >>>>      binary
> >>>>      > testing?
> >>>>      > > >
> >>>>      > > > Thank you for reading this long post. Any feedback is
> welcome.
> >>>>      > > >
> >>>>      > > > *王 钲 荣*
> >>>>      > > >
> >>>>      > > > Zhengrong Wang
> >>>>      > > > Computer Science Department
> >>>>      > > > University of California, Los Angeles
> >>>>      > > > California, USA
> >>>>      > > > 90024
> >>>>      > > >
> >>>>      > > > Work Email: [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > > > Mobile :+1 310-447-4568 <(310)%20447-4568>
> <(310)%20447-4568>
> >>>>      > > > _______________________________________________
> >>>>      > > > gem5-dev mailing list -- [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > > > To unsubscribe send an email to [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>>>      > > _______________________________________________
> >>>>      > > gem5-dev mailing list -- [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > > To unsubscribe send an email to [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>>>      > _______________________________________________
> >>>>      > gem5-dev mailing list -- [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > To unsubscribe send an email to [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>>>      _______________________________________________
> >>>>      gem5-dev mailing list -- [email protected] <mailto:
> >> [email protected]>
> >>>>      To unsubscribe send an email to [email protected]
> >>>>      <mailto:[email protected]>
> >>>>      %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>>>
> >>>
> >>> WARNING / LEGAL TEXT: This message is intended only for the use of 
> >>> the individual or entity to which it is addressed and may contain 
> >>> information which is privileged, confidential, proprietary, or 
> >>> exempt from disclosure under applicable law. If you are not the 
> >>> intended recipient or the person responsible for delivering the 
> >>> message to the intended recipient, you are strictly prohibited 
> >>> from disclosing, distributing, copying, or in any way using this 
> >>> message. If you have received this communication in error, please 
> >>> notify the sender and destroy and delete any copies you may have received.
> >>>
> >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw
> >>> ww.bsc.es%2Fdisclaimer&amp;data=02%7C01%7Cmatthew.poremba%40amd.co
> >>> m%7C3148828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994
> >>> e183d%7C0%7C0%7C637267315736408241&amp;sdata=P9O4piQBA7BPlcYX5ldu9
> >>> gZGUnCiDuidr6kRPtdoRL0%3D&amp;reserved=0
> >>
> >> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbs
> >> c.es%2Fdisclaimer&amp;data=02%7C01%7Cmatthew.poremba%40amd.com%7C31
> >> 48828320b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994e183d%7
> >> C0%7C0%7C637267315736408241&amp;sdata=fEsiokRBqniLIwM%2FF3gNU%2BRq6
> >> rLETz7mVmwDh28XTHE%3D&amp;reserved=0
> >> _______________________________________________
> >> gem5-dev mailing list -- [email protected] To unsubscribe send an 
> >> email to [email protected] 
> >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > _______________________________________________
> > gem5-dev mailing list -- [email protected] To unsubscribe send an 
> > email to [email protected] 
> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbsc.e
> s%2Fdisclaimer&amp;data=02%7C01%7Cmatthew.poremba%40amd.com%7C31488283
> 20b844c0a92308d8073f4b58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C
> 637267315736408241&amp;sdata=fEsiokRBqniLIwM%2FF3gNU%2BRq6rLETz7mVmwDh
> 28XTHE%3D&amp;reserved=0 
> _______________________________________________
> gem5-dev mailing list -- [email protected] To unsubscribe send an 
> email to [email protected] 
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected] To unsubscribe send an email to 
[email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Re: Add AVX512 Support?

Reply via email to