Hi Zhengrong,

I remember you! I have received a few emails regarding our simulator since I presented the paper. My idea was to reply to all
of them once I had submitted some patches to the official repo.

Regarding your questions:

1) It depends on the instruction's addressing mode.  For example:

# VADDPD
## ZMM
def macroop VADDPD512_ZMM_ZMM_ZMM {
    vaddfp vectorReg, vectorReg2, vectorRegm, size=8, vsize=512
};

def macroop VADDPD512_ZMM_ZMM_M {
    vldfpevex vectorAux1, seg, sib, "DISPLACEMENT", dataSize=8, vsize=512, vdata=512
    vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512
};

def macroop VADDPD512_ZMM_ZMM_P {
    rdip t7
    vldfpevex vectorAux1, seg, riprel, "DISPLACEMENT", dataSize=8, vsize=512, vdata=512
    vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512
};

Those macro instructions represent three different addressing modes. The first one does not require a memory access, so just one micro-instruction is needed. This is just a brief example, this instruction has many encodings, depending on the operand size (128, 256 and 512 bits) and on the masking. In my implementation, I consider
all this possibilities.

2) The register dependence is handled as in other ISA. The AVX512 ISA contains 32 512-bit vector registers (if I remember well), so they are accessed as specified the instruction's indices. The register dependence is tracked in the rename stage from the O3 cpu model.

As we have mentioned to Jason, I'm currently working on my last publication of the PhD. My idea is to finish it this summer and start with the gem5 changes in September.

Regards,
Adrián

On 1/6/20 20:17, Zhengrong Wang via gem5-dev wrote:
Hi Adrián,

Yeah we have met at HPCA and I went to your presentation. I think your
implementation is probably more complete and robust. If you plan to
contribute it, it would be great! I do have a few questions:

1. How are the instructions broken into microops? e.g. a "vaddps" is
decoded into a single microop?
2. If there is a single 512-bit register, how do you handle register
dependence? Are there new register read/write APIs for these instructions?

Thanks!

*王 钲 荣*

Zhengrong Wang
Computer Science Department
University of California, Los Angeles
California, USA
90024

Work Email: [email protected]
Mobile :+1 310-447-4568




abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道:

Hi,

I'm Adrián, I have extended the x86's ISA with the newest SIMD
extensions (AVX, AVX2 and AVX512).
As you know, the SSE implementation is inefficient (a 128-bit register
operations is modeled as 2 64-bit scalar operations).
If we plan to add support for AVX and AVX512 ISAs, the first thing to do
is to implement a proper vector register file for the x86 ISA. At the
moment I did it,
SVE was not released so I did it from scratch. Here, we could follow one
of these options:

1) Reuse the SVE implementation of the vector register file.
2) Create a new one, as I did.

Doing the first option means having compatibility with Arm's SVE
instructions, but also means re-implementing all my micro-instructions,
which would take long.
However, my current x86 SIMD micro-instruction's implementation is not
clean, so I'll have to spend time to do it more efficient any way.

In my own vector register file, I perform a memory allocation of 512 bit
for all the registers. Then, every SSE, AVX and AVX512 instruction operates
depending on the instruction's vector size. This implementation has been
tested on several applications (from the ParVec benchmark suite among
others)
and closely follows Intel's description in their official manual. This
simulator has been employed in our paper, published in HPCA2020.

Adrián

On 1/6/20 18:32, Miquel Moreto wrote:
Hi Zhengrong and Jason,

Let me CC Adrian Barredo, the PhD student that implemented AVX
instructions in our gem5 simulation infrastructure. Since he did all
the hard work, I believe it is better that he answers your questions. :-)

Best regards,

--- Miquel

On 1/6/20 17:14, Jason Lowe-Power wrote:
Hey Zhengrong,

Thanks for getting started on this! I've also cc'd Miquel at BSC who
has implemented many of the x86 vector instructions. Miquel, it would
be great to get your input here!

As far as getting input from AMD folks... I think this is going to be
a tough thing for them to weigh in on due to IP issues. This is
getting a bit too close to their products :). They can correct me if
I'm wrong!

To answer your questions:

     - Design of the vector register file. My implementation directly
     follows
     the SSE instructions to minimize the work. Is there any better
     way to do
     this?


I agree with Gabe. This is the best approach for now.

     - If I am going to merge my code, what is a good submission plan?
     I am
     thinking about first committing the skeleton code with a simple
     'vaddps'
     instruction, and then for other instructions.


That sounds good to me. If you think the whole set of changes should
be reviewed together or there's no way to split things apart and
still be understandable, we can create a feature branch for you. That
said, since this is mostly just adding one instruction and doesn't
touch too much outside of the ISA implementation, just breaking it up
that way will probably work.


     - Testing: This is probably the most important one. Currently, I
     manually
     test my code by simulating small programs. What is the best way
     to write
     tests for new instructions? Should I try unit testing for binary
     testing?


If you could submit your programs to the gem5-resources repo, we can
build the binaries and then distribute them for anyone to use for
testing.
I think that works well.

Cheers,
Jason

On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev
<[email protected] <mailto:[email protected]>> wrote:


https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo
     On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev <
     [email protected] <mailto:[email protected]>> wrote:

     > Hi Gabe,
     >
     > Thanks for your reply. For the vector register file, I agree it
     is probably
     > a better idea to stick with current approach, at least it does
     not require
     > changing the SSE instructions. I cound not find your plan to
     redesign the
     > register handling mechanism. If you could provide a link I would
be
     > interested to take a look to have better understanding of the
     philosophy
     > behind the design.
     >
     > Let's hear from AMD first as they have more insights about the
     microop. If
     > everything turns out well, I can start to refactor the code
     into smaller
     > commits and add tests for that.
     >
     > *王 钲 荣*
     >
     > Zhengrong Wang
     > Computer Science Department
     > University of California, Los Angeles
     > California, USA
     > 90024
     >
     > Work Email: [email protected] <mailto:[email protected]
     > Mobile :+1 310-447-4568 <(310)%20447-4568>
     >
     >
     >
     >
     > Gabe Black via gem5-dev <[email protected]
     <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道:
     >
     > > Hi Sean. I'm not aware of anyone working on AVX-512, but it
     would be nice
     > > if the AMD folks could chime in and confirm that. The x86
     microcode was
     > > originally based off of the microcode for the K6 as described
     in a
     > patent.
     > > The floating point parts of that patent were very vague and
     hand wavy,
     > so I
     > > more or less made up the initial part. It would be nice for
     the AMD folks
     > > to chime in here too, as far as what's realistic for the
     design of the
     > > microops.
     > >
     > > As far as testing, we don't have a great scheme for testing
     individual
     > > instructions right now, but that would be really valuable to
     have in the
     > > long run. I've thought a bit about how that might work, but I
     don't have
     > a
     > > plan at the moment. The best thing to do right now is to
     probably to have
     > > small programs that execute the instructions in question and
     print their
     > > inputs/outputs and/or check that the outputs are correct. I
     think our
     > > testing framework has a way to check that program output
     matches a golden
     > > reference, and that could be used to delegate correctness
     checking to the
     > > framework. Bobby can probably give more details here.
     > >
     > > As far as the registers, my preference for now is to do what
     you did and
     > > treat each 64 bit chunk as its own register. There are real
     drawbacks to
     > > this approach, but the existing solution to them, a vector
     register file,
     > > has other, in my opinion more serious, drawbacks. A while ago
     I put
     > > together a manifesto about how I'd want to redo the whole
     register
     > handling
     > > mechanism in gem5, but unfortunately I haven't had time to
     actually
     > > implement very much of it. By treating larger registers as
     groups of
     > > smaller registers, you'd be consistent with the rest of the
     x86 code as
     > it
     > > stands right now. That, and the fact that I think that's the
     lesser of
     > two
     > > evils, makes that my preferred way to go.
     > >
     > > As far as submitting code, there are instructions on the gem5
     website for
     > > creating and submitting reviews. We use gerrit, and so in
     addition to the
     > > instructions we provide, you should be able to find pretty
     good/complete
     > > instructions out on the internet to explain the mechanism of
     sending out
     > a
     > > review. For this or any other change, you'd want to break up
     your work
     > into
     > > logical chunks where everything works before and after any
     given change,
     > > and then send them out (perhaps all together in a series) for
     review.
     > > Exactly how to break things up is up to you, but my opinion
     is that each
     > > change should be logically complete but also about one thing.
     That makes
     > it
     > > easier for a reviewer to wrap their head around what you're
     doing and how
     > > it works without having to untangle multiple things going on
     at once, or
     > > having to merge multiple reviews together in their head to
     see the whole
     > > change their reviewing. If there are lots of related small
     changes (many
     > > individual instructions for instance) it might make sense to
     do one or
     > two
     > > by themselves first, and then once the kinks are worked out
     to do a
     > larger
     > > change with the rest, applying the pattern from the earlier
     reviews.
     > >
     > > Gabe
     > >
     > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <
     > [email protected] <mailto:[email protected]>>
     > > wrote:
     > >
     > > > Hello,
     > > >
     > > > This is my first time posting here, so apologies if I made any
     > mistakes.
     > > >
     > > > The last time I checked the develop branch, gem5 has not
     yet supported
     > > the
     > > > AVX512. And searching the mail list I do not see any plan
     for that. Is
     > > > there any ongoing development to support that? If not, I am
     happy to
     > > > contribute my code. During my research, I have developed
     partial
     > support
     > > > for AVX512 (and AVX-256 as a by-product), which I hope
     would be useful
     > > for
     > > > others.
     > > >
     > > > My implementation so far is a straightforward extension to
     the existing
     > > SSE
     > > > instructions. To summarize it:
     > > >
     > > > - Like SSE implementation, the 512-bit register is broken
     into 8 64-bit
     > > > sub-register. This may not be a good design. Any
     suggestions are
     > welcome.
     > > > - Unlike SSE implementation, most of the instructions are
     broken into a
     > > > single microop. For example, a 512-bit 'vaddps' is decoded
     into one
     > > 'vaddf'
     > > > microop instead of eight.
     > > > - Currently, it supports common arithmetic instructions
     (add, mul,
     > etc.)
     > > > and basic data movement (load, store, mov, extract, insert,
     etc.).
     > > > - No support for masking.
     > > >
     > > > If you guys are interested, I am willing to clean my code
     and submit
     > for
     > > > review. I may need some guidance on:
     > > >
     > > > - Design of the vector register file. My implementation
     directly
     > follows
     > > > the SSE instructions to minimize the work. Is there any
     better way to
     > do
     > > > this?
     > > > - If I am going to merge my code, what is a good submission
     plan? I am
     > > > thinking about first committing the skeleton code with a
simple
     > 'vaddps'
     > > > instruction, and then for other instructions.
     > > > - Testing: This is probably the most important one.
     Currently, I
     > manually
     > > > test my code by simulating small programs. What is the best
     way to
     > write
     > > > tests for new instructions? Should I try unit testing for
     binary
     > testing?
     > > >
     > > > Thank you for reading this long post. Any feedback is welcome.
     > > >
     > > > *王 钲 荣*
     > > >
     > > > Zhengrong Wang
     > > > Computer Science Department
     > > > University of California, Los Angeles
     > > > California, USA
     > > > 90024
     > > >
     > > > Work Email: [email protected]
     <mailto:[email protected]>
     > > > Mobile :+1 310-447-4568 <(310)%20447-4568> <(310)%20447-4568>
     > > > _______________________________________________
     > > > gem5-dev mailing list -- [email protected]
     <mailto:[email protected]>
     > > > To unsubscribe send an email to [email protected]
     <mailto:[email protected]>
     > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
     > > _______________________________________________
     > > gem5-dev mailing list -- [email protected]
     <mailto:[email protected]>
     > > To unsubscribe send an email to [email protected]
     <mailto:[email protected]>
     > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
     > _______________________________________________
     > gem5-dev mailing list -- [email protected]
     <mailto:[email protected]>
     > To unsubscribe send an email to [email protected]
     <mailto:[email protected]>
     > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
     _______________________________________________
     gem5-dev mailing list -- [email protected] <mailto:
[email protected]>
     To unsubscribe send an email to [email protected]
     <mailto:[email protected]>
     %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

http://bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

http://bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to