Hi Adrián,

Yeah we have met at HPCA and I went to your presentation. I think your
implementation is probably more complete and robust. If you plan to
contribute it, it would be great! I do have a few questions:

1. How are the instructions broken into microops? e.g. a "vaddps" is
decoded into a single microop?
2. If there is a single 512-bit register, how do you handle register
dependence? Are there new register read/write APIs for these instructions?

Thanks!

*王 钲 荣*

Zhengrong Wang
Computer Science Department
University of California, Los Angeles
California, USA
90024

Work Email: [email protected]
Mobile :+1 310-447-4568




abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道:

> Hi,
>
> I'm Adrián, I have extended the x86's ISA with the newest SIMD
> extensions (AVX, AVX2 and AVX512).
> As you know, the SSE implementation is inefficient (a 128-bit register
> operations is modeled as 2 64-bit scalar operations).
> If we plan to add support for AVX and AVX512 ISAs, the first thing to do
> is to implement a proper vector register file for the x86 ISA. At the
> moment I did it,
> SVE was not released so I did it from scratch. Here, we could follow one
> of these options:
>
> 1) Reuse the SVE implementation of the vector register file.
> 2) Create a new one, as I did.
>
> Doing the first option means having compatibility with Arm's SVE
> instructions, but also means re-implementing all my micro-instructions,
> which would take long.
> However, my current x86 SIMD micro-instruction's implementation is not
> clean, so I'll have to spend time to do it more efficient any way.
>
> In my own vector register file, I perform a memory allocation of 512 bit
> for all the registers. Then, every SSE, AVX and AVX512 instruction operates
> depending on the instruction's vector size. This implementation has been
> tested on several applications (from the ParVec benchmark suite among
> others)
> and closely follows Intel's description in their official manual. This
> simulator has been employed in our paper, published in HPCA2020.
>
> Adrián
>
> On 1/6/20 18:32, Miquel Moreto wrote:
> >
> > Hi Zhengrong and Jason,
> >
> > Let me CC Adrian Barredo, the PhD student that implemented AVX
> > instructions in our gem5 simulation infrastructure. Since he did all
> > the hard work, I believe it is better that he answers your questions. :-)
> >
> > Best regards,
> >
> > --- Miquel
> >
> > On 1/6/20 17:14, Jason Lowe-Power wrote:
> >> Hey Zhengrong,
> >>
> >> Thanks for getting started on this! I've also cc'd Miquel at BSC who
> >> has implemented many of the x86 vector instructions. Miquel, it would
> >> be great to get your input here!
> >>
> >> As far as getting input from AMD folks... I think this is going to be
> >> a tough thing for them to weigh in on due to IP issues. This is
> >> getting a bit too close to their products :). They can correct me if
> >> I'm wrong!
> >>
> >> To answer your questions:
> >>
> >>     - Design of the vector register file. My implementation directly
> >>     follows
> >>     the SSE instructions to minimize the work. Is there any better
> >>     way to do
> >>     this?
> >>
> >>
> >> I agree with Gabe. This is the best approach for now.
> >>
> >>     - If I am going to merge my code, what is a good submission plan?
> >>     I am
> >>     thinking about first committing the skeleton code with a simple
> >>     'vaddps'
> >>     instruction, and then for other instructions.
> >>
> >>
> >> That sounds good to me. If you think the whole set of changes should
> >> be reviewed together or there's no way to split things apart and
> >> still be understandable, we can create a feature branch for you. That
> >> said, since this is mostly just adding one instruction and doesn't
> >> touch too much outside of the ISA implementation, just breaking it up
> >> that way will probably work.
> >>
> >>
> >>     - Testing: This is probably the most important one. Currently, I
> >>     manually
> >>     test my code by simulating small programs. What is the best way
> >>     to write
> >>     tests for new instructions? Should I try unit testing for binary
> >>     testing?
> >>
> >>
> >> If you could submit your programs to the gem5-resources repo, we can
> >> build the binaries and then distribute them for anyone to use for
> >> testing.
> >> I think that works well.
> >>
> >> Cheers,
> >> Jason
> >>
> >> On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev
> >> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >>
> https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo
> >>
> >>     On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev <
> >>     [email protected] <mailto:[email protected]>> wrote:
> >>
> >>     > Hi Gabe,
> >>     >
> >>     > Thanks for your reply. For the vector register file, I agree it
> >>     is probably
> >>     > a better idea to stick with current approach, at least it does
> >>     not require
> >>     > changing the SSE instructions. I cound not find your plan to
> >>     redesign the
> >>     > register handling mechanism. If you could provide a link I would
> be
> >>     > interested to take a look to have better understanding of the
> >>     philosophy
> >>     > behind the design.
> >>     >
> >>     > Let's hear from AMD first as they have more insights about the
> >>     microop. If
> >>     > everything turns out well, I can start to refactor the code
> >>     into smaller
> >>     > commits and add tests for that.
> >>     >
> >>     > *王 钲 荣*
> >>     >
> >>     > Zhengrong Wang
> >>     > Computer Science Department
> >>     > University of California, Los Angeles
> >>     > California, USA
> >>     > 90024
> >>     >
> >>     > Work Email: [email protected] <mailto:[email protected]
> >
> >>     > Mobile :+1 310-447-4568 <(310)%20447-4568>
> >>     >
> >>     >
> >>     >
> >>     >
> >>     > Gabe Black via gem5-dev <[email protected]
> >>     <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道:
> >>     >
> >>     > > Hi Sean. I'm not aware of anyone working on AVX-512, but it
> >>     would be nice
> >>     > > if the AMD folks could chime in and confirm that. The x86
> >>     microcode was
> >>     > > originally based off of the microcode for the K6 as described
> >>     in a
> >>     > patent.
> >>     > > The floating point parts of that patent were very vague and
> >>     hand wavy,
> >>     > so I
> >>     > > more or less made up the initial part. It would be nice for
> >>     the AMD folks
> >>     > > to chime in here too, as far as what's realistic for the
> >>     design of the
> >>     > > microops.
> >>     > >
> >>     > > As far as testing, we don't have a great scheme for testing
> >>     individual
> >>     > > instructions right now, but that would be really valuable to
> >>     have in the
> >>     > > long run. I've thought a bit about how that might work, but I
> >>     don't have
> >>     > a
> >>     > > plan at the moment. The best thing to do right now is to
> >>     probably to have
> >>     > > small programs that execute the instructions in question and
> >>     print their
> >>     > > inputs/outputs and/or check that the outputs are correct. I
> >>     think our
> >>     > > testing framework has a way to check that program output
> >>     matches a golden
> >>     > > reference, and that could be used to delegate correctness
> >>     checking to the
> >>     > > framework. Bobby can probably give more details here.
> >>     > >
> >>     > > As far as the registers, my preference for now is to do what
> >>     you did and
> >>     > > treat each 64 bit chunk as its own register. There are real
> >>     drawbacks to
> >>     > > this approach, but the existing solution to them, a vector
> >>     register file,
> >>     > > has other, in my opinion more serious, drawbacks. A while ago
> >>     I put
> >>     > > together a manifesto about how I'd want to redo the whole
> >>     register
> >>     > handling
> >>     > > mechanism in gem5, but unfortunately I haven't had time to
> >>     actually
> >>     > > implement very much of it. By treating larger registers as
> >>     groups of
> >>     > > smaller registers, you'd be consistent with the rest of the
> >>     x86 code as
> >>     > it
> >>     > > stands right now. That, and the fact that I think that's the
> >>     lesser of
> >>     > two
> >>     > > evils, makes that my preferred way to go.
> >>     > >
> >>     > > As far as submitting code, there are instructions on the gem5
> >>     website for
> >>     > > creating and submitting reviews. We use gerrit, and so in
> >>     addition to the
> >>     > > instructions we provide, you should be able to find pretty
> >>     good/complete
> >>     > > instructions out on the internet to explain the mechanism of
> >>     sending out
> >>     > a
> >>     > > review. For this or any other change, you'd want to break up
> >>     your work
> >>     > into
> >>     > > logical chunks where everything works before and after any
> >>     given change,
> >>     > > and then send them out (perhaps all together in a series) for
> >>     review.
> >>     > > Exactly how to break things up is up to you, but my opinion
> >>     is that each
> >>     > > change should be logically complete but also about one thing.
> >>     That makes
> >>     > it
> >>     > > easier for a reviewer to wrap their head around what you're
> >>     doing and how
> >>     > > it works without having to untangle multiple things going on
> >>     at once, or
> >>     > > having to merge multiple reviews together in their head to
> >>     see the whole
> >>     > > change their reviewing. If there are lots of related small
> >>     changes (many
> >>     > > individual instructions for instance) it might make sense to
> >>     do one or
> >>     > two
> >>     > > by themselves first, and then once the kinks are worked out
> >>     to do a
> >>     > larger
> >>     > > change with the rest, applying the pattern from the earlier
> >>     reviews.
> >>     > >
> >>     > > Gabe
> >>     > >
> >>     > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <
> >>     > [email protected] <mailto:[email protected]>>
> >>     > > wrote:
> >>     > >
> >>     > > > Hello,
> >>     > > >
> >>     > > > This is my first time posting here, so apologies if I made any
> >>     > mistakes.
> >>     > > >
> >>     > > > The last time I checked the develop branch, gem5 has not
> >>     yet supported
> >>     > > the
> >>     > > > AVX512. And searching the mail list I do not see any plan
> >>     for that. Is
> >>     > > > there any ongoing development to support that? If not, I am
> >>     happy to
> >>     > > > contribute my code. During my research, I have developed
> >>     partial
> >>     > support
> >>     > > > for AVX512 (and AVX-256 as a by-product), which I hope
> >>     would be useful
> >>     > > for
> >>     > > > others.
> >>     > > >
> >>     > > > My implementation so far is a straightforward extension to
> >>     the existing
> >>     > > SSE
> >>     > > > instructions. To summarize it:
> >>     > > >
> >>     > > > - Like SSE implementation, the 512-bit register is broken
> >>     into 8 64-bit
> >>     > > > sub-register. This may not be a good design. Any
> >>     suggestions are
> >>     > welcome.
> >>     > > > - Unlike SSE implementation, most of the instructions are
> >>     broken into a
> >>     > > > single microop. For example, a 512-bit 'vaddps' is decoded
> >>     into one
> >>     > > 'vaddf'
> >>     > > > microop instead of eight.
> >>     > > > - Currently, it supports common arithmetic instructions
> >>     (add, mul,
> >>     > etc.)
> >>     > > > and basic data movement (load, store, mov, extract, insert,
> >>     etc.).
> >>     > > > - No support for masking.
> >>     > > >
> >>     > > > If you guys are interested, I am willing to clean my code
> >>     and submit
> >>     > for
> >>     > > > review. I may need some guidance on:
> >>     > > >
> >>     > > > - Design of the vector register file. My implementation
> >>     directly
> >>     > follows
> >>     > > > the SSE instructions to minimize the work. Is there any
> >>     better way to
> >>     > do
> >>     > > > this?
> >>     > > > - If I am going to merge my code, what is a good submission
> >>     plan? I am
> >>     > > > thinking about first committing the skeleton code with a
> simple
> >>     > 'vaddps'
> >>     > > > instruction, and then for other instructions.
> >>     > > > - Testing: This is probably the most important one.
> >>     Currently, I
> >>     > manually
> >>     > > > test my code by simulating small programs. What is the best
> >>     way to
> >>     > write
> >>     > > > tests for new instructions? Should I try unit testing for
> >>     binary
> >>     > testing?
> >>     > > >
> >>     > > > Thank you for reading this long post. Any feedback is welcome.
> >>     > > >
> >>     > > > *王 钲 荣*
> >>     > > >
> >>     > > > Zhengrong Wang
> >>     > > > Computer Science Department
> >>     > > > University of California, Los Angeles
> >>     > > > California, USA
> >>     > > > 90024
> >>     > > >
> >>     > > > Work Email: [email protected]
> >>     <mailto:[email protected]>
> >>     > > > Mobile :+1 310-447-4568 <(310)%20447-4568> <(310)%20447-4568>
> >>     > > > _______________________________________________
> >>     > > > gem5-dev mailing list -- [email protected]
> >>     <mailto:[email protected]>
> >>     > > > To unsubscribe send an email to [email protected]
> >>     <mailto:[email protected]>
> >>     > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>     > > _______________________________________________
> >>     > > gem5-dev mailing list -- [email protected]
> >>     <mailto:[email protected]>
> >>     > > To unsubscribe send an email to [email protected]
> >>     <mailto:[email protected]>
> >>     > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>     > _______________________________________________
> >>     > gem5-dev mailing list -- [email protected]
> >>     <mailto:[email protected]>
> >>     > To unsubscribe send an email to [email protected]
> >>     <mailto:[email protected]>
> >>     > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>     _______________________________________________
> >>     gem5-dev mailing list -- [email protected] <mailto:
> [email protected]>
> >>     To unsubscribe send an email to [email protected]
> >>     <mailto:[email protected]>
> >>     %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> >>
> >
> >
> > WARNING / LEGAL TEXT: This message is intended only for the use of the
> > individual or entity to which it is addressed and may contain
> > information which is privileged, confidential, proprietary, or exempt
> > from disclosure under applicable law. If you are not the intended
> > recipient or the person responsible for delivering the message to the
> > intended recipient, you are strictly prohibited from disclosing,
> > distributing, copying, or in any way using this message. If you have
> > received this communication in error, please notify the sender and
> > destroy and delete any copies you may have received.
> >
> > http://www.bsc.es/disclaimer
>
>
> http://bsc.es/disclaimer
> _______________________________________________
> gem5-dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to