It's been a while, but I do remember there being a data dependence. The code that you'll want to look at is here (look for Div2): src/arch/x86/isa/microops/regop.isa
I think the dependence is that it scans for 1s in one of the inputs, but it's really been too long to remember. Gabe On Mon, Apr 20, 2015 at 8:22 AM, Jason Power <[email protected]> wrote: > Yes, I believe that the divide micro-ops currently uses the divide unit > latency, which I think is the cause of the large discrepancy between the > x86 and ARM performance. > > Jason > > On Mon, Apr 20, 2015 at 10:16 AM Steve Reinhardt <[email protected]> wrote: > > > I see. The confusion all makes sense now. > > > > Do the x86 divide micro-ops currently use the divide unit latencies? If > > not, what latencies do they use? > > > > My gut reaction is that we should have a "divide step" functional unit > that > > the x86 micro-ops should use, independent of the full divider that the > > other ISAs use. That way we eliminate (or at least reduce) the confusion > > but can keep the more realistic x86 implementation. It's not clear how > > different that is from the status quo, though... certainly you'll still > > have the confusion that changing the "divide" unit parameters won't > impact > > x86 performance. > > > > Steve > > > > On Mon, Apr 20, 2015 at 7:39 AM, Nilay Vaish <[email protected]> wrote: > > > > > Given the discussion we had so far, it seems that we should stick with > > > Gabe's implementation, but for x86 we should change the integer > division > > > latency to a single cycle. The default latency is 20 cycles, which is > > not > > > right for x86. > > > > > > -- > > > Nilay > > > > > > > > > > > > On Mon, 20 Apr 2015, Steve Reinhardt wrote: > > > > > > Thanks for speaking up Gabe... I agree on both counts. I should have > > said > > >> "probably not realistic any more". Also, a single-cycle divide is > > arguably > > >> at least as unrealistic in the other direction. > > >> > > >> Looking at table 17 in section B.6 on p. 349 of the AMD SW > optimization > > >> guide (http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf), > > >> integer > > >> divide latencies are data-dependent, and a 64-bit divide can take > > anywhere > > >> from 9 to 72 cycles. If I'm understanding Gabe's old algorithm > > correctly, > > >> it looks like it takes a fixed number of cycles, though assuming the > > >> branch > > >> overhead can be overlapped, that number is probably pretty close to > the > > >> upper bound of the actual value, at least for recent AMD processors. > (I > > >> haven't looked for equivalent official Intel docs, though if > > >> https://gmplib.org/~tege/x86-timing.pdf is correct, the latency can > be > > up > > >> to 95 cycles on Haswell.) > > >> > > >> Is that right, Gabe? Or is there a data dependency in that microcode > > loop > > >> that's not obvious? > > >> > > >> The most flexible thing to do from a timing perspective would be to > code > > >> the division in C and then program the latency separately. However, > > since > > >> the computation really is microcoded (see p. 248), that would not give > > >> realistic results if you care about the modeling of microcode fetch > etc. > > >> (which would impact power models if nothing else). > > >> > > >> Steve > > >> > > >> > > >> On Mon, Apr 20, 2015 at 2:56 AM, Gabe Black <[email protected]> > > wrote: > > >> > > >> The original was implemented based on the K6 microops. It might not > be > > >>> realistic any more (although I don't think single cycle division is > > >>> either?), but it wasn't entirely made up. > > >>> > > >>> Gabe > > >>> > > >>> On Sun, Apr 19, 2015 at 12:33 PM, Steve Reinhardt <[email protected]> > > >>> wrote: > > >>> > > >>> On Sun, Apr 19, 2015 at 9:25 AM, Nilay Vaish <[email protected]> > > wrote: > > >>>> > > >>>> On Sun, 19 Apr 2015, Steve Reinhardt wrote: > > >>>>> > > >>>>> > > >>>>> ----------------------------------------------------------- > > >>>>>> This is an automatically generated e-mail. To reply, visit: > > >>>>>> http://reviews.gem5.org/r/2743/#review6052 > > >>>>>> ----------------------------------------------------------- > > >>>>>> > > >>>>>> > > >>>>>> I like the restructuring... I agree the micro-op loop is probably > > not > > >>>>>> realistic. Is there a reason to code a loop in C though, as > opposed > > >>>>>> > > >>>>> to > > >>> > > >>>> just using '/' and '%'? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> The dividend is represented as rdx:rax, which means upto 128 bits > of > > >>>>> > > >>>> data. > > >>>> > > >>>>> So we would not be able to carry out division by just using '/' and > > '%' > > >>>>> when only using 64-bit integers. GCC and LLVM both support 128-bit > > >>>>> integers on x86-64 platforms. We may want to use those, but I > don't > > >>>>> > > >>>> know > > >>> > > >>>> if that would cause any compatibility problems. > > >>>>> > > >>>>> -- > > >>>>> Nilay > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> Ah, thanks... I didn't look closely enough to see that it was a > > 128-bit > > >>>> operation. I'd be fine with using gcc/llvm 128-bit support if > others > > >>>> > > >>> are. > > >>> > > >>>> If not, there are ways to build a 128-bit operation out of the > 64-bit > > >>>> operations that would still be simpler than the bitwise loop. For > > >>>> > > >>> example, > > >>> > > >>>> I found this: > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > > http://codereview.stackexchange.com/questions/67962/mostly-portable-128-by-64-bit-division > > >>> > > >>>> > > >>>> and if I read the StackExchange terms correctly, we could just use > > that > > >>>> code with an appropriate attribution and a link in a comment back to > > the > > >>>> question (look under Subscriber Content): > > >>>> http://stackexchange.com/legal/terms-of-service > > >>>> > > >>>> Steve > > >>>> _______________________________________________ > > >>>> gem5-dev mailing list > > >>>> [email protected] > > >>>> http://m5sim.org/mailman/listinfo/gem5-dev > > >>>> > > >>>> _______________________________________________ > > >>> gem5-dev mailing list > > >>> [email protected] > > >>> http://m5sim.org/mailman/listinfo/gem5-dev > > >>> > > >>> _______________________________________________ > > >> gem5-dev mailing list > > >> [email protected] > > >> http://m5sim.org/mailman/listinfo/gem5-dev > > >> > > >> > > >> _______________________________________________ > > > gem5-dev mailing list > > > [email protected] > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
