Re: [gem5-dev] Review Request 2743: x86: reimplement integer division

Gabe Black Mon, 20 Apr 2015 11:59:49 -0700

It's been a while, but I do remember there being a data dependence. The
code that you'll want to look at is here (look for Div2):
src/arch/x86/isa/microops/regop.isa


I think the dependence is that it scans for 1s in one of the inputs, but
it's really been too long to remember.

Gabe

On Mon, Apr 20, 2015 at 8:22 AM, Jason Power <[email protected]> wrote:

> Yes, I believe that the divide micro-ops currently uses the divide unit
> latency, which I think is the cause of the large discrepancy between the
> x86 and ARM performance.
>
> Jason
>
> On Mon, Apr 20, 2015 at 10:16 AM Steve Reinhardt <[email protected]> wrote:
>
> > I see.  The confusion all makes sense now.
> >
> > Do the x86 divide micro-ops currently use the divide unit latencies?  If
> > not, what latencies do they use?
> >
> > My gut reaction is that we should have a "divide step" functional unit
> that
> > the x86 micro-ops should use, independent of the full divider that the
> > other ISAs use. That way we eliminate (or at least reduce) the confusion
> > but can keep the more realistic x86 implementation.  It's not clear how
> > different that is from the status quo, though... certainly you'll still
> > have the confusion that changing the "divide" unit parameters won't
> impact
> > x86 performance.
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 7:39 AM, Nilay Vaish <[email protected]> wrote:
> >
> > > Given the discussion we had so far, it seems that we should stick with
> > > Gabe's implementation, but for x86 we should change the integer
> division
> > > latency to a single cycle.  The default latency is 20 cycles, which is
> > not
> > > right for x86.
> > >
> > > --
> > > Nilay
> > >
> > >
> > >
> > > On Mon, 20 Apr 2015, Steve Reinhardt wrote:
> > >
> > >  Thanks for speaking up Gabe... I agree on both counts. I should have
> > said
> > >> "probably not realistic any more". Also, a single-cycle divide is
> > arguably
> > >> at least as unrealistic in the other direction.
> > >>
> > >> Looking at table 17 in section B.6 on p. 349 of the AMD SW
> optimization
> > >> guide (http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf),
> > >> integer
> > >> divide latencies are data-dependent, and a 64-bit divide can take
> > anywhere
> > >> from 9 to 72 cycles.  If I'm understanding Gabe's old algorithm
> > correctly,
> > >> it looks like it takes a fixed number of cycles, though assuming the
> > >> branch
> > >> overhead can be overlapped, that number is probably pretty close to
> the
> > >> upper bound of the actual value, at least for recent AMD processors.
> (I
> > >> haven't looked for equivalent official Intel docs, though if
> > >> https://gmplib.org/~tege/x86-timing.pdf is correct, the latency can
> be
> > up
> > >> to 95 cycles on Haswell.)
> > >>
> > >> Is that right, Gabe?  Or is there a data dependency in that microcode
> > loop
> > >> that's not obvious?
> > >>
> > >> The most flexible thing to do from a timing perspective would be to
> code
> > >> the division in C and then program the latency separately. However,
> > since
> > >> the computation really is microcoded (see p. 248), that would not give
> > >> realistic results if you care about the modeling of microcode fetch
> etc.
> > >> (which would impact power models if nothing else).
> > >>
> > >> Steve
> > >>
> > >>
> > >> On Mon, Apr 20, 2015 at 2:56 AM, Gabe Black <[email protected]>
> > wrote:
> > >>
> > >>  The original was implemented based on the K6 microops. It might not
> be
> > >>> realistic any more (although I don't think single cycle division is
> > >>> either?), but it wasn't entirely made up.
> > >>>
> > >>> Gabe
> > >>>
> > >>> On Sun, Apr 19, 2015 at 12:33 PM, Steve Reinhardt <[email protected]>
> > >>> wrote:
> > >>>
> > >>>  On Sun, Apr 19, 2015 at 9:25 AM, Nilay Vaish <[email protected]>
> > wrote:
> > >>>>
> > >>>>  On Sun, 19 Apr 2015, Steve Reinhardt wrote:
> > >>>>>
> > >>>>>
> > >>>>>  -----------------------------------------------------------
> > >>>>>> This is an automatically generated e-mail. To reply, visit:
> > >>>>>> http://reviews.gem5.org/r/2743/#review6052
> > >>>>>> -----------------------------------------------------------
> > >>>>>>
> > >>>>>>
> > >>>>>> I like the restructuring... I agree the micro-op loop is probably
> > not
> > >>>>>> realistic.  Is there a reason to code a loop in C though, as
> opposed
> > >>>>>>
> > >>>>> to
> > >>>
> > >>>> just using '/' and '%'?
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>> The dividend is represented as rdx:rax, which means upto 128 bits
> of
> > >>>>>
> > >>>> data.
> > >>>>
> > >>>>> So we would not be able to carry out division by just using '/' and
> > '%'
> > >>>>> when only using 64-bit integers.  GCC and LLVM both support 128-bit
> > >>>>> integers on x86-64 platforms.  We may want to use those, but I
> don't
> > >>>>>
> > >>>> know
> > >>>
> > >>>> if that would cause any compatibility problems.
> > >>>>>
> > >>>>> --
> > >>>>> Nilay
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Ah, thanks... I didn't look closely enough to see that it was a
> > 128-bit
> > >>>> operation.  I'd be fine with using gcc/llvm 128-bit support if
> others
> > >>>>
> > >>> are.
> > >>>
> > >>>> If not, there are ways to build a 128-bit operation out of the
> 64-bit
> > >>>> operations that would still be simpler than the bitwise loop.  For
> > >>>>
> > >>> example,
> > >>>
> > >>>> I found this:
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> >
> http://codereview.stackexchange.com/questions/67962/mostly-portable-128-by-64-bit-division
> > >>>
> > >>>>
> > >>>> and if I read the StackExchange terms correctly, we could just use
> > that
> > >>>> code with an appropriate attribution and a link in a comment back to
> > the
> > >>>> question (look under Subscriber Content):
> > >>>> http://stackexchange.com/legal/terms-of-service
> > >>>>
> > >>>> Steve
> > >>>> _______________________________________________
> > >>>> gem5-dev mailing list
> > >>>> [email protected]
> > >>>> http://m5sim.org/mailman/listinfo/gem5-dev
> > >>>>
> > >>>>  _______________________________________________
> > >>> gem5-dev mailing list
> > >>> [email protected]
> > >>> http://m5sim.org/mailman/listinfo/gem5-dev
> > >>>
> > >>>  _______________________________________________
> > >> gem5-dev mailing list
> > >> [email protected]
> > >> http://m5sim.org/mailman/listinfo/gem5-dev
> > >>
> > >>
> > >>  _______________________________________________
> > > gem5-dev mailing list
> > > [email protected]
> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > >
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2743: x86: reimplement integer division

Reply via email to