Hi Sean. I'm not aware of anyone working on AVX-512, but it would be nice
if the AMD folks could chime in and confirm that. The x86 microcode was
originally based off of the microcode for the K6 as described in a patent.
The floating point parts of that patent were very vague and hand wavy, so I
more or less made up the initial part. It would be nice for the AMD folks
to chime in here too, as far as what's realistic for the design of the
microops.

As far as testing, we don't have a great scheme for testing individual
instructions right now, but that would be really valuable to have in the
long run. I've thought a bit about how that might work, but I don't have a
plan at the moment. The best thing to do right now is to probably to have
small programs that execute the instructions in question and print their
inputs/outputs and/or check that the outputs are correct. I think our
testing framework has a way to check that program output matches a golden
reference, and that could be used to delegate correctness checking to the
framework. Bobby can probably give more details here.

As far as the registers, my preference for now is to do what you did and
treat each 64 bit chunk as its own register. There are real drawbacks to
this approach, but the existing solution to them, a vector register file,
has other, in my opinion more serious, drawbacks. A while ago I put
together a manifesto about how I'd want to redo the whole register handling
mechanism in gem5, but unfortunately I haven't had time to actually
implement very much of it. By treating larger registers as groups of
smaller registers, you'd be consistent with the rest of the x86 code as it
stands right now. That, and the fact that I think that's the lesser of two
evils, makes that my preferred way to go.

As far as submitting code, there are instructions on the gem5 website for
creating and submitting reviews. We use gerrit, and so in addition to the
instructions we provide, you should be able to find pretty good/complete
instructions out on the internet to explain the mechanism of sending out a
review. For this or any other change, you'd want to break up your work into
logical chunks where everything works before and after any given change,
and then send them out (perhaps all together in a series) for review.
Exactly how to break things up is up to you, but my opinion is that each
change should be logically complete but also about one thing. That makes it
easier for a reviewer to wrap their head around what you're doing and how
it works without having to untangle multiple things going on at once, or
having to merge multiple reviews together in their head to see the whole
change their reviewing. If there are lots of related small changes (many
individual instructions for instance) it might make sense to do one or two
by themselves first, and then once the kinks are worked out to do a larger
change with the rest, applying the pattern from the earlier reviews.

Gabe

On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <gem5-dev@gem5.org>
wrote:

> Hello,
>
> This is my first time posting here, so apologies if I made any mistakes.
>
> The last time I checked the develop branch, gem5 has not yet supported the
> AVX512. And searching the mail list I do not see any plan for that. Is
> there any ongoing development to support that? If not, I am happy to
> contribute my code. During my research, I have developed partial support
> for AVX512 (and AVX-256 as a by-product), which I hope would be useful for
> others.
>
> My implementation so far is a straightforward extension to the existing SSE
> instructions. To summarize it:
>
> - Like SSE implementation, the 512-bit register is broken into 8 64-bit
> sub-register. This may not be a good design. Any suggestions are welcome.
> - Unlike SSE implementation, most of the instructions are broken into a
> single microop. For example, a 512-bit 'vaddps' is decoded into one 'vaddf'
> microop instead of eight.
> - Currently, it supports common arithmetic instructions (add, mul, etc.)
> and basic data movement (load, store, mov, extract, insert, etc.).
> - No support for masking.
>
> If you guys are interested, I am willing to clean my code and submit for
> review. I may need some guidance on:
>
> - Design of the vector register file. My implementation directly follows
> the SSE instructions to minimize the work. Is there any better way to do
> this?
> - If I am going to merge my code, what is a good submission plan? I am
> thinking about first committing the skeleton code with a simple 'vaddps'
> instruction, and then for other instructions.
> - Testing: This is probably the most important one. Currently, I manually
> test my code by simulating small programs. What is the best way to write
> tests for new instructions? Should I try unit testing for binary testing?
>
> Thank you for reading this long post. Any feedback is welcome.
>
> *王 钲 荣*
>
> Zhengrong Wang
> Computer Science Department
> University of California, Los Angeles
> California, USA
> 90024
>
> Work Email: seanyukig...@gmail.com
> Mobile :+1 310-447-4568 <(310)%20447-4568>
> _______________________________________________
> gem5-dev mailing list -- gem5-dev@gem5.org
> To unsubscribe send an email to gem5-dev-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to