Hi Zhengrong,
I remember you! I have received a few emails regarding our simulator
since I presented the paper. My idea was to reply to all
of them once I had submitted some patches to the official repo.
Regarding your questions:
1) It depends on the instruction's addressing mode. For example:
# VADDPD
## ZMM
def macroop VADDPD512_ZMM_ZMM_ZMM {
vaddfp vectorReg, vectorReg2, vectorRegm, size=8, vsize=512
};
def macroop VADDPD512_ZMM_ZMM_M {
vldfpevex vectorAux1, seg, sib, "DISPLACEMENT", dataSize=8,
vsize=512, vdata=512
vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512
};
def macroop VADDPD512_ZMM_ZMM_P {
rdip t7
vldfpevex vectorAux1, seg, riprel, "DISPLACEMENT", dataSize=8,
vsize=512, vdata=512
vaddfp vectorReg, vectorReg2, vectorAux1, size=8, vsize=512
};
Those macro instructions represent three different addressing modes. The
first one does not require a memory access, so just one
micro-instruction is needed.
This is just a brief example, this instruction has many encodings,
depending on the operand size (128, 256 and 512 bits) and on the
masking. In my implementation, I consider
all this possibilities.
2) The register dependence is handled as in other ISA. The AVX512 ISA
contains 32 512-bit vector registers (if I remember well), so they are
accessed as specified the instruction's indices.
The register dependence is tracked in the rename stage from the O3 cpu
model.
As we have mentioned to Jason, I'm currently working on my last
publication of the PhD. My idea is to finish it this summer and start
with the gem5 changes in September.
Regards,
Adrián
On 1/6/20 20:17, Zhengrong Wang via gem5-dev wrote:
Hi Adrián,
Yeah we have met at HPCA and I went to your presentation. I think your
implementation is probably more complete and robust. If you plan to
contribute it, it would be great! I do have a few questions:
1. How are the instructions broken into microops? e.g. a "vaddps" is
decoded into a single microop?
2. If there is a single 512-bit register, how do you handle register
dependence? Are there new register read/write APIs for these instructions?
Thanks!
*王 钲 荣*
Zhengrong Wang
Computer Science Department
University of California, Los Angeles
California, USA
90024
Work Email: [email protected]
Mobile :+1 310-447-4568
abarredo via gem5-dev <[email protected]> 于2020年6月1日周一 上午10:20写道:
Hi,
I'm Adrián, I have extended the x86's ISA with the newest SIMD
extensions (AVX, AVX2 and AVX512).
As you know, the SSE implementation is inefficient (a 128-bit register
operations is modeled as 2 64-bit scalar operations).
If we plan to add support for AVX and AVX512 ISAs, the first thing to do
is to implement a proper vector register file for the x86 ISA. At the
moment I did it,
SVE was not released so I did it from scratch. Here, we could follow one
of these options:
1) Reuse the SVE implementation of the vector register file.
2) Create a new one, as I did.
Doing the first option means having compatibility with Arm's SVE
instructions, but also means re-implementing all my micro-instructions,
which would take long.
However, my current x86 SIMD micro-instruction's implementation is not
clean, so I'll have to spend time to do it more efficient any way.
In my own vector register file, I perform a memory allocation of 512 bit
for all the registers. Then, every SSE, AVX and AVX512 instruction operates
depending on the instruction's vector size. This implementation has been
tested on several applications (from the ParVec benchmark suite among
others)
and closely follows Intel's description in their official manual. This
simulator has been employed in our paper, published in HPCA2020.
Adrián
On 1/6/20 18:32, Miquel Moreto wrote:
Hi Zhengrong and Jason,
Let me CC Adrian Barredo, the PhD student that implemented AVX
instructions in our gem5 simulation infrastructure. Since he did all
the hard work, I believe it is better that he answers your questions. :-)
Best regards,
--- Miquel
On 1/6/20 17:14, Jason Lowe-Power wrote:
Hey Zhengrong,
Thanks for getting started on this! I've also cc'd Miquel at BSC who
has implemented many of the x86 vector instructions. Miquel, it would
be great to get your input here!
As far as getting input from AMD folks... I think this is going to be
a tough thing for them to weigh in on due to IP issues. This is
getting a bit too close to their products :). They can correct me if
I'm wrong!
To answer your questions:
- Design of the vector register file. My implementation directly
follows
the SSE instructions to minimize the work. Is there any better
way to do
this?
I agree with Gabe. This is the best approach for now.
- If I am going to merge my code, what is a good submission plan?
I am
thinking about first committing the skeleton code with a simple
'vaddps'
instruction, and then for other instructions.
That sounds good to me. If you think the whole set of changes should
be reviewed together or there's no way to split things apart and
still be understandable, we can create a feature branch for you. That
said, since this is mostly just adding one instruction and doesn't
touch too much outside of the ISA implementation, just breaking it up
that way will probably work.
- Testing: This is probably the most important one. Currently, I
manually
test my code by simulating small programs. What is the best way
to write
tests for new instructions? Should I try unit testing for binary
testing?
If you could submit your programs to the gem5-resources repo, we can
build the binaries and then distribute them for anyone to use for
testing.
I think that works well.
Cheers,
Jason
On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev
<[email protected] <mailto:[email protected]>> wrote:
https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo
On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev <
[email protected] <mailto:[email protected]>> wrote:
> Hi Gabe,
>
> Thanks for your reply. For the vector register file, I agree it
is probably
> a better idea to stick with current approach, at least it does
not require
> changing the SSE instructions. I cound not find your plan to
redesign the
> register handling mechanism. If you could provide a link I would
be
> interested to take a look to have better understanding of the
philosophy
> behind the design.
>
> Let's hear from AMD first as they have more insights about the
microop. If
> everything turns out well, I can start to refactor the code
into smaller
> commits and add tests for that.
>
> *王 钲 荣*
>
> Zhengrong Wang
> Computer Science Department
> University of California, Los Angeles
> California, USA
> 90024
>
> Work Email: [email protected] <mailto:[email protected]
> Mobile :+1 310-447-4568 <(310)%20447-4568>
>
>
>
>
> Gabe Black via gem5-dev <[email protected]
<mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道:
>
> > Hi Sean. I'm not aware of anyone working on AVX-512, but it
would be nice
> > if the AMD folks could chime in and confirm that. The x86
microcode was
> > originally based off of the microcode for the K6 as described
in a
> patent.
> > The floating point parts of that patent were very vague and
hand wavy,
> so I
> > more or less made up the initial part. It would be nice for
the AMD folks
> > to chime in here too, as far as what's realistic for the
design of the
> > microops.
> >
> > As far as testing, we don't have a great scheme for testing
individual
> > instructions right now, but that would be really valuable to
have in the
> > long run. I've thought a bit about how that might work, but I
don't have
> a
> > plan at the moment. The best thing to do right now is to
probably to have
> > small programs that execute the instructions in question and
print their
> > inputs/outputs and/or check that the outputs are correct. I
think our
> > testing framework has a way to check that program output
matches a golden
> > reference, and that could be used to delegate correctness
checking to the
> > framework. Bobby can probably give more details here.
> >
> > As far as the registers, my preference for now is to do what
you did and
> > treat each 64 bit chunk as its own register. There are real
drawbacks to
> > this approach, but the existing solution to them, a vector
register file,
> > has other, in my opinion more serious, drawbacks. A while ago
I put
> > together a manifesto about how I'd want to redo the whole
register
> handling
> > mechanism in gem5, but unfortunately I haven't had time to
actually
> > implement very much of it. By treating larger registers as
groups of
> > smaller registers, you'd be consistent with the rest of the
x86 code as
> it
> > stands right now. That, and the fact that I think that's the
lesser of
> two
> > evils, makes that my preferred way to go.
> >
> > As far as submitting code, there are instructions on the gem5
website for
> > creating and submitting reviews. We use gerrit, and so in
addition to the
> > instructions we provide, you should be able to find pretty
good/complete
> > instructions out on the internet to explain the mechanism of
sending out
> a
> > review. For this or any other change, you'd want to break up
your work
> into
> > logical chunks where everything works before and after any
given change,
> > and then send them out (perhaps all together in a series) for
review.
> > Exactly how to break things up is up to you, but my opinion
is that each
> > change should be logically complete but also about one thing.
That makes
> it
> > easier for a reviewer to wrap their head around what you're
doing and how
> > it works without having to untangle multiple things going on
at once, or
> > having to merge multiple reviews together in their head to
see the whole
> > change their reviewing. If there are lots of related small
changes (many
> > individual instructions for instance) it might make sense to
do one or
> two
> > by themselves first, and then once the kinks are worked out
to do a
> larger
> > change with the rest, applying the pattern from the earlier
reviews.
> >
> > Gabe
> >
> > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <
> [email protected] <mailto:[email protected]>>
> > wrote:
> >
> > > Hello,
> > >
> > > This is my first time posting here, so apologies if I made any
> mistakes.
> > >
> > > The last time I checked the develop branch, gem5 has not
yet supported
> > the
> > > AVX512. And searching the mail list I do not see any plan
for that. Is
> > > there any ongoing development to support that? If not, I am
happy to
> > > contribute my code. During my research, I have developed
partial
> support
> > > for AVX512 (and AVX-256 as a by-product), which I hope
would be useful
> > for
> > > others.
> > >
> > > My implementation so far is a straightforward extension to
the existing
> > SSE
> > > instructions. To summarize it:
> > >
> > > - Like SSE implementation, the 512-bit register is broken
into 8 64-bit
> > > sub-register. This may not be a good design. Any
suggestions are
> welcome.
> > > - Unlike SSE implementation, most of the instructions are
broken into a
> > > single microop. For example, a 512-bit 'vaddps' is decoded
into one
> > 'vaddf'
> > > microop instead of eight.
> > > - Currently, it supports common arithmetic instructions
(add, mul,
> etc.)
> > > and basic data movement (load, store, mov, extract, insert,
etc.).
> > > - No support for masking.
> > >
> > > If you guys are interested, I am willing to clean my code
and submit
> for
> > > review. I may need some guidance on:
> > >
> > > - Design of the vector register file. My implementation
directly
> follows
> > > the SSE instructions to minimize the work. Is there any
better way to
> do
> > > this?
> > > - If I am going to merge my code, what is a good submission
plan? I am
> > > thinking about first committing the skeleton code with a
simple
> 'vaddps'
> > > instruction, and then for other instructions.
> > > - Testing: This is probably the most important one.
Currently, I
> manually
> > > test my code by simulating small programs. What is the best
way to
> write
> > > tests for new instructions? Should I try unit testing for
binary
> testing?
> > >
> > > Thank you for reading this long post. Any feedback is welcome.
> > >
> > > *王 钲 荣*
> > >
> > > Zhengrong Wang
> > > Computer Science Department
> > > University of California, Los Angeles
> > > California, USA
> > > 90024
> > >
> > > Work Email: [email protected]
<mailto:[email protected]>
> > > Mobile :+1 310-447-4568 <(310)%20447-4568> <(310)%20447-4568>
> > > _______________________________________________
> > > gem5-dev mailing list -- [email protected]
<mailto:[email protected]>
> > > To unsubscribe send an email to [email protected]
<mailto:[email protected]>
> > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > _______________________________________________
> > gem5-dev mailing list -- [email protected]
<mailto:[email protected]>
> > To unsubscribe send an email to [email protected]
<mailto:[email protected]>
> > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> _______________________________________________
> gem5-dev mailing list -- [email protected]
<mailto:[email protected]>
> To unsubscribe send an email to [email protected]
<mailto:[email protected]>
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected] <mailto:
[email protected]>
To unsubscribe send an email to [email protected]
<mailto:[email protected]>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
http://bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
http://bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s