No problem!  It happens - thanks for being so responsive.

Lisa

On Sat, Feb 20, 2010 at 12:30 PM, Timothy M Jones <[email protected]>wrote:

> Hi Lisa,
>
> I'm sorry to say that this was a problem with my patches.  I'm still unsure
> why all of the regressions passed for me - I looked over my logs and they
> were all fine.  However, when I ran them again just now then I got the same
> problem as you.  Looking into it, I had simply forgotten to propagate a
> fault back to the processor after a read or write.
>
> This patch here should fix it all up again.  I'm running all of the
> regressions again.  I'll push the patch on Monday if they are all ok.
>
> Sorry for the problem again.
>
>
> Tim
>
> On Fri, 19 Feb 2010 16:35:34 -0500, Lisa Hsu <[email protected]> wrote:
>
>  Timothy,
>>
>> Did you check this against ALPHA_FS/test/fast/long/10.linux-boot tests
>> under
>> the O3 configurations?  I believe this changeset may have broken them.  I
>> am
>> getting ready to make a push, and I passed ALPHA_FS linux boot regression
>> tests a few days ago, but since pulling the last 6 changesets, I've begun
>> failing on this test, and this O3 change seems to be the most obvious
>> breaker.  What happens is that it seems like no forward progress is made
>> in
>> the program and it just gets hung and runs forever. This happens on a
>> clean
>> tree after having removed my patches that I want to push.  Can you look
>> into
>> it?
>>
>> Thanks,
>> Lisa
>>
>> On Fri, Feb 12, 2010 at 12:52 PM, Timothy M. Jones <[email protected]
>> >wrote:
>>
>>  changeset 4d4903a3e7c5 in /z/repo/m5
>>> details: http://repo.m5sim.org/m5?cmd=changeset;node=4d4903a3e7c5
>>> description:
>>>       O3PCU: Split loads and stores that cross cache line boundaries.
>>>
>>>       When each load or store is sent to the LSQ, we check whether it
>>> will
>>> cross a
>>>       cache line boundary and, if so, split it in two. This creates two
>>> TLB
>>>       translations and two memory requests. Care has to be taken if the
>>> first
>>>       packet of a split load is sent but the second blocks the cache.
>>> Similarly,
>>>       for a store, if the first packet cannot be sent, we must store the
>>> second
>>>       one somewhere to retry later.
>>>
>>>       This modifies the LSQSenderState class to record both packets in a
>>> split
>>>       load or store.
>>>
>>>       Finally, a new const variable, HasUnalignedMemAcc, is added to each
>>> ISA
>>>       to indicate whether unaligned memory accesses are allowed. This is
>>> used
>>>       throughout the changed code so that compiler can optimise away code
>>> dealing
>>>       with split requests for ISAs that don't need them.
>>>
>>> diffstat:
>>>
>>> 11 files changed, 378 insertions(+), 53 deletions(-)
>>> src/arch/alpha/isa_traits.hh |    3
>>> src/arch/arm/isa_traits.hh   |    3
>>> src/arch/mips/isa_traits.hh  |    3
>>> src/arch/power/isa_traits.hh |    3
>>> src/arch/sparc/isa_traits.hh |    3
>>> src/arch/x86/isa_traits.hh   |    3
>>> src/cpu/base_dyn_inst.hh     |   75 ++++++++++++++++---
>>> src/cpu/o3/cpu.hh            |   15 ++-
>>> src/cpu/o3/lsq.hh            |   22 +++--
>>> src/cpu/o3/lsq_unit.hh       |  138 +++++++++++++++++++++++++++++++----
>>> src/cpu/o3/lsq_unit_impl.hh  |  163
>>> ++++++++++++++++++++++++++++++++++++++----
>>>
>>> diffs (truncated from 831 to 300 lines):
>>>
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/alpha/isa_traits.hh
>>> --- a/src/arch/alpha/isa_traits.hh      Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/alpha/isa_traits.hh      Fri Feb 12 19:53:20 2010 +0000
>>> @@ -131,6 +131,9 @@
>>>  // Alpha UNOP (ldq_u r31,0(r0))
>>>  const ExtMachInst NoopMachInst = 0x2ffe0000;
>>>
>>> +// Memory accesses cannot be unaligned
>>> +const bool HasUnalignedMemAcc = false;
>>> +
>>>  } // namespace AlphaISA
>>>
>>>  #endif // __ARCH_ALPHA_ISA_TRAITS_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/arm/isa_traits.hh
>>> --- a/src/arch/arm/isa_traits.hh        Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/arm/isa_traits.hh        Fri Feb 12 19:53:20 2010 +0000
>>> @@ -106,6 +106,9 @@
>>>    const int ByteBytes = 1;
>>>
>>>    const uint32_t HighVecs = 0xFFFF0000;
>>> +
>>> +    // Memory accesses cannot be unaligned
>>> +    const bool HasUnalignedMemAcc = false;
>>>  };
>>>
>>>  using namespace ArmISA;
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/mips/isa_traits.hh
>>> --- a/src/arch/mips/isa_traits.hh       Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/mips/isa_traits.hh       Fri Feb 12 19:53:20 2010 +0000
>>> @@ -164,6 +164,9 @@
>>>  const int ANNOTE_NONE = 0;
>>>  const uint32_t ITOUCH_ANNOTE = 0xffffffff;
>>>
>>> +// Memory accesses cannot be unaligned
>>> +const bool HasUnalignedMemAcc = false;
>>> +
>>>  };
>>>
>>>  #endif // __ARCH_MIPS_ISA_TRAITS_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/power/isa_traits.hh
>>> --- a/src/arch/power/isa_traits.hh      Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/power/isa_traits.hh      Fri Feb 12 19:53:20 2010 +0000
>>> @@ -70,6 +70,9 @@
>>>  // This is ori 0, 0, 0
>>>  const ExtMachInst NoopMachInst = 0x60000000;
>>>
>>> +// Memory accesses can be unaligned
>>> +const bool HasUnalignedMemAcc = true;
>>> +
>>>  } // PowerISA namespace
>>>
>>>  #endif // __ARCH_POWER_ISA_TRAITS_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/sparc/isa_traits.hh
>>> --- a/src/arch/sparc/isa_traits.hh      Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/sparc/isa_traits.hh      Fri Feb 12 19:53:20 2010 +0000
>>> @@ -98,6 +98,9 @@
>>>    };
>>>
>>>  #endif
>>> +
>>> +// Memory accesses cannot be unaligned
>>> +const bool HasUnalignedMemAcc = false;
>>>  }
>>>
>>>  #endif // __ARCH_SPARC_ISA_TRAITS_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/arch/x86/isa_traits.hh
>>> --- a/src/arch/x86/isa_traits.hh        Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/arch/x86/isa_traits.hh        Fri Feb 12 19:53:20 2010 +0000
>>> @@ -91,6 +91,9 @@
>>>    StaticInstPtr decodeInst(ExtMachInst);
>>>
>>>    const Addr LoadAddrMask = ULL(-1);
>>> +
>>> +    // Memory accesses can be unaligned
>>> +    const bool HasUnalignedMemAcc = true;
>>>  };
>>>
>>>  #endif // __ARCH_X86_ISATRAITS_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/cpu/base_dyn_inst.hh
>>> --- a/src/cpu/base_dyn_inst.hh  Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/cpu/base_dyn_inst.hh  Fri Feb 12 19:53:20 2010 +0000
>>> @@ -131,8 +131,13 @@
>>>    template <class T>
>>>    Fault write(T data, Addr addr, unsigned flags, uint64_t *res);
>>>
>>> +    /** Splits a request in two if it crosses a dcache block. */
>>> +    void splitRequest(RequestPtr req, RequestPtr &sreqLow,
>>> +                      RequestPtr &sreqHigh);
>>> +
>>>    /** Initiate a DTB address translation. */
>>> -    void initiateTranslation(RequestPtr req, uint64_t *res,
>>> +    void initiateTranslation(RequestPtr req, RequestPtr sreqLow,
>>> +                             RequestPtr sreqHigh, uint64_t *res,
>>>                             BaseTLB::Mode mode);
>>>
>>>    /** Finish a DTB address translation. */
>>> @@ -870,12 +875,19 @@
>>>    Request *req = new Request(asid, addr, sizeof(T), flags, this->PC,
>>>                               thread->contextId(), threadNumber);
>>>
>>> -    initiateTranslation(req, NULL, BaseTLB::Read);
>>> +    Request *sreqLow = NULL;
>>> +    Request *sreqHigh = NULL;
>>> +
>>> +    // Only split the request if the ISA supports unaligned accesses.
>>> +    if (TheISA::HasUnalignedMemAcc) {
>>> +        splitRequest(req, sreqLow, sreqHigh);
>>> +    }
>>> +    initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read);
>>>
>>>    if (fault == NoFault) {
>>>        effAddr = req->getVaddr();
>>>        effAddrValid = true;
>>> -        cpu->read(req, data, lqIdx);
>>> +        cpu->read(req, sreqLow, sreqHigh, data, lqIdx);
>>>    } else {
>>>
>>>        // Return a fixed value to keep simulation deterministic even
>>> @@ -909,12 +921,19 @@
>>>    Request *req = new Request(asid, addr, sizeof(T), flags, this->PC,
>>>                               thread->contextId(), threadNumber);
>>>
>>> -    initiateTranslation(req, res, BaseTLB::Write);
>>> +    Request *sreqLow = NULL;
>>> +    Request *sreqHigh = NULL;
>>> +
>>> +    // Only split the request if the ISA supports unaligned accesses.
>>> +    if (TheISA::HasUnalignedMemAcc) {
>>> +        splitRequest(req, sreqLow, sreqHigh);
>>> +    }
>>> +    initiateTranslation(req, sreqLow, sreqHigh, res, BaseTLB::Write);
>>>
>>>    if (fault == NoFault) {
>>>        effAddr = req->getVaddr();
>>>        effAddrValid = true;
>>> -        cpu->write(req, data, sqIdx);
>>> +        cpu->write(req, sreqLow, sreqHigh, data, sqIdx);
>>>    }
>>>
>>>    return fault;
>>> @@ -922,14 +941,48 @@
>>>
>>>  template<class Impl>
>>>  inline void
>>> -BaseDynInst<Impl>::initiateTranslation(RequestPtr req, uint64_t *res,
>>> +BaseDynInst<Impl>::splitRequest(RequestPtr req, RequestPtr &sreqLow,
>>> +                                RequestPtr &sreqHigh)
>>> +{
>>> +    // Check to see if the request crosses the next level block
>>> boundary.
>>> +    unsigned block_size = cpu->getDcachePort()->peerBlockSize();
>>> +    Addr addr = req->getVaddr();
>>> +    Addr split_addr = roundDown(addr + req->getSize() - 1, block_size);
>>> +    assert(split_addr <= addr || split_addr - addr < block_size);
>>> +
>>> +    // Spans two blocks.
>>> +    if (split_addr > addr) {
>>> +        req->splitOnVaddr(split_addr, sreqLow, sreqHigh);
>>> +    }
>>> +}
>>> +
>>> +template<class Impl>
>>> +inline void
>>> +BaseDynInst<Impl>::initiateTranslation(RequestPtr req, RequestPtr
>>> sreqLow,
>>> +                                       RequestPtr sreqHigh, uint64_t
>>> *res,
>>>                                       BaseTLB::Mode mode)
>>>  {
>>> -    WholeTranslationState *state =
>>> -        new WholeTranslationState(req, NULL, res, mode);
>>> -    DataTranslation<BaseDynInst<Impl> > *trans =
>>> -        new DataTranslation<BaseDynInst<Impl> >(this, state);
>>> -    cpu->dtb->translateTiming(req, thread->getTC(), trans, mode);
>>> +    if (!TheISA::HasUnalignedMemAcc || sreqLow == NULL) {
>>> +        WholeTranslationState *state =
>>> +            new WholeTranslationState(req, NULL, res, mode);
>>> +
>>> +        // One translation if the request isn't split.
>>> +        DataTranslation<BaseDynInst<Impl> > *trans =
>>> +            new DataTranslation<BaseDynInst<Impl> >(this, state);
>>> +        cpu->dtb->translateTiming(req, thread->getTC(), trans, mode);
>>> +    } else {
>>> +        WholeTranslationState *state =
>>> +            new WholeTranslationState(req, sreqLow, sreqHigh, NULL, res,
>>> mode);
>>> +
>>> +        // Two translations when the request is split.
>>> +        DataTranslation<BaseDynInst<Impl> > *stransLow =
>>> +            new DataTranslation<BaseDynInst<Impl> >(this, state, 0);
>>> +        DataTranslation<BaseDynInst<Impl> > *stransHigh =
>>> +            new DataTranslation<BaseDynInst<Impl> >(this, state, 1);
>>> +
>>> +        cpu->dtb->translateTiming(sreqLow, thread->getTC(), stransLow,
>>> mode);
>>> +        cpu->dtb->translateTiming(sreqHigh, thread->getTC(), stransHigh,
>>> mode);
>>> +    }
>>>  }
>>>
>>>  template<class Impl>
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/cpu/o3/cpu.hh
>>> --- a/src/cpu/o3/cpu.hh Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/cpu/o3/cpu.hh Fri Feb 12 19:53:20 2010 +0000
>>> @@ -703,18 +703,25 @@
>>>
>>>    /** CPU read function, forwards read to LSQ. */
>>>    template <class T>
>>> -    Fault read(RequestPtr &req, T &data, int load_idx)
>>> +    Fault read(RequestPtr &req, RequestPtr &sreqLow, RequestPtr
>>> &sreqHigh,
>>> +               T &data, int load_idx)
>>>    {
>>> -        return this->iew.ldstQueue.read(req, data, load_idx);
>>> +        return this->iew.ldstQueue.read(req, sreqLow, sreqHigh,
>>> +                                        data, load_idx);
>>>    }
>>>
>>>    /** CPU write function, forwards write to LSQ. */
>>>    template <class T>
>>> -    Fault write(RequestPtr &req, T &data, int store_idx)
>>> +    Fault write(RequestPtr &req, RequestPtr &sreqLow, RequestPtr
>>> &sreqHigh,
>>> +                T &data, int store_idx)
>>>    {
>>> -        return this->iew.ldstQueue.write(req, data, store_idx);
>>> +        return this->iew.ldstQueue.write(req, sreqLow, sreqHigh,
>>> +                                         data, store_idx);
>>>    }
>>>
>>> +    /** Get the dcache port (used to find block size for translations).
>>> */
>>> +    Port *getDcachePort() { return this->iew.ldstQueue.getDcachePort();
>>> }
>>> +
>>>    Addr lockAddr;
>>>
>>>    /** Temporary fix for the lock flag, works in the UP case. */
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/cpu/o3/lsq.hh
>>> --- a/src/cpu/o3/lsq.hh Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/cpu/o3/lsq.hh Fri Feb 12 19:53:20 2010 +0000
>>> @@ -270,15 +270,19 @@
>>>    void dumpInsts(ThreadID tid)
>>>    { thread[tid].dumpInsts(); }
>>>
>>> -    /** Executes a read operation, using the load specified at the load
>>> index. */
>>> +    /** Executes a read operation, using the load specified at the load
>>> +     * index.
>>> +     */
>>>    template <class T>
>>> -    Fault read(RequestPtr req, T &data, int load_idx);
>>> +    Fault read(RequestPtr req, RequestPtr sreqLow, RequestPtr sreqHigh,
>>> +               T &data, int load_idx);
>>>
>>>    /** Executes a store operation, using the store specified at the store
>>> -     *   index.
>>> +     * index.
>>>     */
>>>    template <class T>
>>> -    Fault write(RequestPtr req, T &data, int store_idx);
>>> +    Fault write(RequestPtr req, RequestPtr sreqLow, RequestPtr sreqHigh,
>>> +                T &data, int store_idx);
>>>
>>>    /** The CPU pointer. */
>>>    O3CPU *cpu;
>>> @@ -369,21 +373,23 @@
>>>  template <class Impl>
>>>  template <class T>
>>>  Fault
>>> -LSQ<Impl>::read(RequestPtr req, T &data, int load_idx)
>>> +LSQ<Impl>::read(RequestPtr req, RequestPtr sreqLow, RequestPtr sreqHigh,
>>> +                T &data, int load_idx)
>>>  {
>>>    ThreadID tid = req->threadId();
>>>
>>> -    return thread[tid].read(req, data, load_idx);
>>> +    return thread[tid].read(req, sreqLow, sreqHigh, data, load_idx);
>>>  }
>>>
>>>  template <class Impl>
>>>  template <class T>
>>>  Fault
>>> -LSQ<Impl>::write(RequestPtr req, T &data, int store_idx)
>>> +LSQ<Impl>::write(RequestPtr req, RequestPtr sreqLow, RequestPtr
>>> sreqHigh,
>>> +                 T &data, int store_idx)
>>>  {
>>>    ThreadID tid = req->threadId();
>>>
>>> -    return thread[tid].write(req, data, store_idx);
>>> +    return thread[tid].write(req, sreqLow, sreqHigh, data, store_idx);
>>>  }
>>>
>>>  #endif // __CPU_O3_LSQ_HH__
>>> diff -r a123bd350935 -r 4d4903a3e7c5 src/cpu/o3/lsq_unit.hh
>>> --- a/src/cpu/o3/lsq_unit.hh    Fri Feb 12 19:53:19 2010 +0000
>>> +++ b/src/cpu/o3/lsq_unit.hh    Fri Feb 12 19:53:20 2010 +0000
>>> @@ -216,12 +216,18 @@
>>>    /** Writes back the instruction, sending it to IEW. */
>>>    void writeback(DynInstPtr &inst, PacketPtr pkt);
>>>
>>> +    /** Writes back a store that couldn't be completed the previous
>>> cycle.
>>> */
>>> +    void writebackPendingStore();
>>> +
>>>    /** Handles completing the send of a store to memory. */
>>>    void storePostSend(PacketPtr pkt);
>>>
>>>    /** Completes the store at the specified index. */
>>>    void completeStore(int store_idx);
>>>
>>> +    /** Attempts to send a store to the cache. */
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>
>>>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to