Hi,

I'm Adrián, I have extended the x86's ISA with the newest SIMD extensions (AVX, AVX2 and AVX512). As you know, the SSE implementation is inefficient (a 128-bit register operations is modeled as 2 64-bit scalar operations). If we plan to add support for AVX and AVX512 ISAs, the first thing to do is to implement a proper vector register file for the x86 ISA. At the moment I did it, SVE was not released so I did it from scratch. Here, we could follow one of these options:

1) Reuse the SVE implementation of the vector register file.
2) Create a new one, as I did.

Doing the first option means having compatibility with Arm's SVE instructions, but also means re-implementing all my micro-instructions, which would take long. However, my current x86 SIMD micro-instruction's implementation is not clean, so I'll have to spend time to do it more efficient any way.

In my own vector register file, I perform a memory allocation of 512 bit for all the registers. Then, every SSE, AVX and AVX512 instruction operates depending on the instruction's vector size. This implementation has been tested on several applications (from the ParVec benchmark suite among others) and closely follows Intel's description in their official manual. This simulator has been employed in our paper, published in HPCA2020.

Adrián

On 1/6/20 18:32, Miquel Moreto wrote:

Hi Zhengrong and Jason,

Let me CC Adrian Barredo, the PhD student that implemented AVX instructions in our gem5 simulation infrastructure. Since he did all the hard work, I believe it is better that he answers your questions. :-)

Best regards,

--- Miquel

On 1/6/20 17:14, Jason Lowe-Power wrote:
Hey Zhengrong,

Thanks for getting started on this! I've also cc'd Miquel at BSC who has implemented many of the x86 vector instructions. Miquel, it would be great to get your input here!

As far as getting input from AMD folks... I think this is going to be a tough thing for them to weigh in on due to IP issues. This is getting a bit too close to their products :). They can correct me if I'm wrong!

To answer your questions:

    - Design of the vector register file. My implementation directly
    follows
    the SSE instructions to minimize the work. Is there any better
    way to do
    this?


I agree with Gabe. This is the best approach for now.

    - If I am going to merge my code, what is a good submission plan?
    I am
    thinking about first committing the skeleton code with a simple
    'vaddps'
    instruction, and then for other instructions.


That sounds good to me. If you think the whole set of changes should be reviewed together or there's no way to split things apart and still be understandable, we can create a feature branch for you. That said, since this is mostly just adding one instruction and doesn't touch too much outside of the ISA implementation, just breaking it up that way will probably work.


    - Testing: This is probably the most important one. Currently, I
    manually
    test my code by simulating small programs. What is the best way
    to write
    tests for new instructions? Should I try unit testing for binary
testing?

If you could submit your programs to the gem5-resources repo, we can build the binaries and then distribute them for anyone to use for testing.
I think that works well.

Cheers,
Jason

On Sun, May 31, 2020 at 10:14 PM Gabe Black via gem5-dev <[email protected] <mailto:[email protected]>> wrote:

    
https://docs.google.com/document/d/1O_u_Xq14TgreYThuZcbM3kuXFCrKvaFHA2O9poCeHSk/edit#heading=h.r067bn3rmydo

    On Sun, May 31, 2020 at 9:31 PM Zhengrong Wang via gem5-dev <
    [email protected] <mailto:[email protected]>> wrote:

    > Hi Gabe,
    >
    > Thanks for your reply. For the vector register file, I agree it
    is probably
    > a better idea to stick with current approach, at least it does
    not require
    > changing the SSE instructions. I cound not find your plan to
    redesign the
    > register handling mechanism. If you could provide a link I would be
    > interested to take a look to have better understanding of the
    philosophy
    > behind the design.
    >
    > Let's hear from AMD first as they have more insights about the
    microop. If
    > everything turns out well, I can start to refactor the code
    into smaller
    > commits and add tests for that.
    >
    > *王 钲 荣*
    >
    > Zhengrong Wang
    > Computer Science Department
    > University of California, Los Angeles
    > California, USA
    > 90024
    >
    > Work Email: [email protected] <mailto:[email protected]>
    > Mobile :+1 310-447-4568 <(310)%20447-4568>
    >
    >
    >
    >
    > Gabe Black via gem5-dev <[email protected]
    <mailto:[email protected]>> 于2020年5月31日周日 下午7:44写道:
    >
    > > Hi Sean. I'm not aware of anyone working on AVX-512, but it
    would be nice
    > > if the AMD folks could chime in and confirm that. The x86
    microcode was
    > > originally based off of the microcode for the K6 as described
    in a
    > patent.
    > > The floating point parts of that patent were very vague and
    hand wavy,
    > so I
    > > more or less made up the initial part. It would be nice for
    the AMD folks
    > > to chime in here too, as far as what's realistic for the
    design of the
    > > microops.
    > >
    > > As far as testing, we don't have a great scheme for testing
    individual
    > > instructions right now, but that would be really valuable to
    have in the
    > > long run. I've thought a bit about how that might work, but I
    don't have
    > a
    > > plan at the moment. The best thing to do right now is to
    probably to have
    > > small programs that execute the instructions in question and
    print their
    > > inputs/outputs and/or check that the outputs are correct. I
    think our
    > > testing framework has a way to check that program output
    matches a golden
    > > reference, and that could be used to delegate correctness
    checking to the
    > > framework. Bobby can probably give more details here.
    > >
    > > As far as the registers, my preference for now is to do what
    you did and
    > > treat each 64 bit chunk as its own register. There are real
    drawbacks to
    > > this approach, but the existing solution to them, a vector
    register file,
    > > has other, in my opinion more serious, drawbacks. A while ago
    I put
    > > together a manifesto about how I'd want to redo the whole
    register
    > handling
    > > mechanism in gem5, but unfortunately I haven't had time to
    actually
    > > implement very much of it. By treating larger registers as
    groups of
    > > smaller registers, you'd be consistent with the rest of the
    x86 code as
    > it
    > > stands right now. That, and the fact that I think that's the
    lesser of
    > two
    > > evils, makes that my preferred way to go.
    > >
    > > As far as submitting code, there are instructions on the gem5
    website for
    > > creating and submitting reviews. We use gerrit, and so in
    addition to the
    > > instructions we provide, you should be able to find pretty
    good/complete
    > > instructions out on the internet to explain the mechanism of
    sending out
    > a
    > > review. For this or any other change, you'd want to break up
    your work
    > into
    > > logical chunks where everything works before and after any
    given change,
    > > and then send them out (perhaps all together in a series) for
    review.
    > > Exactly how to break things up is up to you, but my opinion
    is that each
    > > change should be logically complete but also about one thing.
    That makes
    > it
    > > easier for a reviewer to wrap their head around what you're
    doing and how
    > > it works without having to untangle multiple things going on
    at once, or
    > > having to merge multiple reviews together in their head to
    see the whole
    > > change their reviewing. If there are lots of related small
    changes (many
    > > individual instructions for instance) it might make sense to
    do one or
    > two
    > > by themselves first, and then once the kinks are worked out
    to do a
    > larger
    > > change with the rest, applying the pattern from the earlier
    reviews.
    > >
    > > Gabe
    > >
    > > On Sun, May 31, 2020 at 4:18 PM Sean Wong via gem5-dev <
    > [email protected] <mailto:[email protected]>>
    > > wrote:
    > >
    > > > Hello,
    > > >
    > > > This is my first time posting here, so apologies if I made any
    > mistakes.
    > > >
    > > > The last time I checked the develop branch, gem5 has not
    yet supported
    > > the
    > > > AVX512. And searching the mail list I do not see any plan
    for that. Is
    > > > there any ongoing development to support that? If not, I am
    happy to
    > > > contribute my code. During my research, I have developed
    partial
    > support
    > > > for AVX512 (and AVX-256 as a by-product), which I hope
    would be useful
    > > for
    > > > others.
    > > >
    > > > My implementation so far is a straightforward extension to
    the existing
    > > SSE
    > > > instructions. To summarize it:
    > > >
    > > > - Like SSE implementation, the 512-bit register is broken
    into 8 64-bit
    > > > sub-register. This may not be a good design. Any
    suggestions are
    > welcome.
    > > > - Unlike SSE implementation, most of the instructions are
    broken into a
    > > > single microop. For example, a 512-bit 'vaddps' is decoded
    into one
    > > 'vaddf'
    > > > microop instead of eight.
    > > > - Currently, it supports common arithmetic instructions
    (add, mul,
    > etc.)
    > > > and basic data movement (load, store, mov, extract, insert,
    etc.).
    > > > - No support for masking.
    > > >
    > > > If you guys are interested, I am willing to clean my code
    and submit
    > for
    > > > review. I may need some guidance on:
    > > >
    > > > - Design of the vector register file. My implementation
    directly
    > follows
    > > > the SSE instructions to minimize the work. Is there any
    better way to
    > do
    > > > this?
    > > > - If I am going to merge my code, what is a good submission
    plan? I am
    > > > thinking about first committing the skeleton code with a simple
    > 'vaddps'
    > > > instruction, and then for other instructions.
    > > > - Testing: This is probably the most important one.
    Currently, I
    > manually
    > > > test my code by simulating small programs. What is the best
    way to
    > write
    > > > tests for new instructions? Should I try unit testing for
    binary
    > testing?
    > > >
    > > > Thank you for reading this long post. Any feedback is welcome.
    > > >
    > > > *王 钲 荣*
    > > >
    > > > Zhengrong Wang
    > > > Computer Science Department
    > > > University of California, Los Angeles
    > > > California, USA
    > > > 90024
    > > >
    > > > Work Email: [email protected]
    <mailto:[email protected]>
    > > > Mobile :+1 310-447-4568 <(310)%20447-4568> <(310)%20447-4568>
    > > > _______________________________________________
    > > > gem5-dev mailing list -- [email protected]
    <mailto:[email protected]>
    > > > To unsubscribe send an email to [email protected]
    <mailto:[email protected]>
    > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
    > > _______________________________________________
    > > gem5-dev mailing list -- [email protected]
    <mailto:[email protected]>
    > > To unsubscribe send an email to [email protected]
    <mailto:[email protected]>
    > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
    > _______________________________________________
    > gem5-dev mailing list -- [email protected]
    <mailto:[email protected]>
    > To unsubscribe send an email to [email protected]
    <mailto:[email protected]>
    > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
    _______________________________________________
    gem5-dev mailing list -- [email protected] <mailto:[email protected]>
    To unsubscribe send an email to [email protected]
    <mailto:[email protected]>
    %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s



WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer


http://bsc.es/disclaimer
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to