Re: [gem5-dev] Review Request 2743: x86: reimplement integer division

Steve Reinhardt Mon, 20 Apr 2015 07:08:25 -0700

Thanks for speaking up Gabe... I agree on both counts. I should have said
"probably not realistic any more". Also, a single-cycle divide is arguably
at least as unrealistic in the other direction.

Looking at table 17 in section B.6 on p. 349 of the AMD SW optimization
guide (http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf), integer
divide latencies are data-dependent, and a 64-bit divide can take anywhere
from 9 to 72 cycles.  If I'm understanding Gabe's old algorithm correctly,
it looks like it takes a fixed number of cycles, though assuming the branch
overhead can be overlapped, that number is probably pretty close to the
upper bound of the actual value, at least for recent AMD processors.  (I
haven't looked for equivalent official Intel docs, though if
https://gmplib.org/~tege/x86-timing.pdf is correct, the latency can be up
to 95 cycles on Haswell.)

Is that right, Gabe?  Or is there a data dependency in that microcode loop
that's not obvious?

The most flexible thing to do from a timing perspective would be to code
the division in C and then program the latency separately. However, since
the computation really is microcoded (see p. 248), that would not give
realistic results if you care about the modeling of microcode fetch etc.
(which would impact power models if nothing else).

Steve

On Mon, Apr 20, 2015 at 2:56 AM, Gabe Black <[email protected]> wrote:

> The original was implemented based on the K6 microops. It might not be
> realistic any more (although I don't think single cycle division is
> either?), but it wasn't entirely made up.
>
> Gabe
>
> On Sun, Apr 19, 2015 at 12:33 PM, Steve Reinhardt <[email protected]>
> wrote:
>
> > On Sun, Apr 19, 2015 at 9:25 AM, Nilay Vaish <[email protected]> wrote:
> >
> > > On Sun, 19 Apr 2015, Steve Reinhardt wrote:
> > >
> > >
> > >> -----------------------------------------------------------
> > >> This is an automatically generated e-mail. To reply, visit:
> > >> http://reviews.gem5.org/r/2743/#review6052
> > >> -----------------------------------------------------------
> > >>
> > >>
> > >> I like the restructuring... I agree the micro-op loop is probably not
> > >> realistic.  Is there a reason to code a loop in C though, as opposed
> to
> > >> just using '/' and '%'?
> > >>
> > >>
> > >
> > > The dividend is represented as rdx:rax, which means upto 128 bits of
> > data.
> > > So we would not be able to carry out division by just using '/' and '%'
> > > when only using 64-bit integers.  GCC and LLVM both support 128-bit
> > > integers on x86-64 platforms.  We may want to use those, but I don't
> know
> > > if that would cause any compatibility problems.
> > >
> > > --
> > > Nilay
> >
> >
> >
> > Ah, thanks... I didn't look closely enough to see that it was a 128-bit
> > operation.  I'd be fine with using gcc/llvm 128-bit support if others
> are.
> > If not, there are ways to build a 128-bit operation out of the 64-bit
> > operations that would still be simpler than the bitwise loop.  For
> example,
> > I found this:
> >
> >
> >
> http://codereview.stackexchange.com/questions/67962/mostly-portable-128-by-64-bit-division
> >
> > and if I read the StackExchange terms correctly, we could just use that
> > code with an appropriate attribution and a link in a comment back to the
> > question (look under Subscriber Content):
> > http://stackexchange.com/legal/terms-of-service
> >
> > Steve
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2743: x86: reimplement integer division

Reply via email to