Good points about potential padding.

The structure could be just used to obtain the width, where as the
assignment of data could be just be in a void* or whatever arbitrary type as
I'm guessing that once the request is made, it will simply read/write as
many bytes as the width says it should.

Matt would you be willing to give one of these a try in the interim?

On Tue, Jan 19, 2010 at 9:48 PM, Gabriel Michael Black <
[email protected]> wrote:

> A structure might not work because of padding (I'm just guessing
> though), but what about uint8_t [3]? That might get wierd if something
> tries to dereference a pointer to that type and do a direct
> assignment, something like:
>
> template <class T>
> bar(T *foo)
> {
>     ...
>     T *bar = (T *)baz;
>     *bar = *foo;
> }
>
> But that might work too. I'm not sure.
>
> Gabe
>
> Quoting Korey Sewell <[email protected]>:
>
> > Hi Matt/Steve,
> > sorry I'm late to the conversation.
> >
> > In short, the reason it's implemented currently as a read-modify-write
> was
> > because what Matt found out to be true: the 3-byte store problem. At the
> > time of implementation, M5 was at the crossroads of another memory system
> > change (remember mergedmem anyone?) and there wasnt any really good,
> > unobtrusive solution to integrate this in.
> >
> > Like Steve mentioned, the ISA/CPU model "dance" was just not working out
> at
> > the time but getting them to work in the AtomicSimpleCPU was at least a
> good
> > functionality start. I had a "readFunctional" there which was able to
> read
> > from memory without adversely affecting the LSQ and other dependent
> > structures. Reading the data out first before storing would allow you to
> get
> > the right data and *hopefully not upset the timing too much depending on
> > what you're doing. Performing a regular read there will definitely not
> work
> > there though as Matt found out.
> >
> > I believe the only addition that we previously made was to add a
> "unaligned"
> > flag or something of that effect to the request structure so that we
> wouldnt
> > fault when the swl/swr pairs were used in MIPS.
> >
> > I dont know why this never came up then (or maybe it did), but a proposed
> > solution might be to create a new struct that is exactly 3 bytes long
> (say
> > an uint16_t and a uint8_t), then when a 3-byte write is necessary, we can
> > use that structure to generate the right width for the request.
> >
> > Thoughts?
> >
> > On Tue, Jan 19, 2010 at 8:31 PM, Steve Reinhardt <[email protected]>
> wrote:
> >
> >> On Tue, Jan 19, 2010 at 5:13 PM, Matt DeVuyst <[email protected]>
> wrote:
> >> > I don't see how 3 byte stores will work.  The write() functions don't
> >> > take a size argument.  They just template the data argument (and can
> >> > figure out the size by with sizeof(T)).  So, for example, if you want
> >> > to write 2 bytes to memory you just call write((uint16_t)data,
> >> > address, flags, NULL).  But I'm not aware of any data type that's 3
> >> > bytes wide.  Is changing all the write() functions to accept a size
> >> > argument the best thing to do here?
> >>
> >> Good point... I hadn't thought that through.  I think the memory
> >> system at the lower level will handle this (i.e., I believe you can
> >> put together a Request object that has size=3 and send that to a
> >> memory object and it will work; it's really just the interface between
> >> the ISA and the CPU model that gets in the way.
> >>
> >> >
> >> > Also, I'm trying to picture how these instructions would be
> >> > implemented in a real MIPS processor.  Would you really be able to
> >> > write 3 bytes to memory or would you have to read a full word first
> >> > and then modify the appropriate bytes and write back?
> >>
> >> In hardware it's pretty easy to do per-byte enables if you need to;
> >> much simpler than doing a read-modify-write sequence.  The only reason
> >> I can imagine that they wouldn't do it that way is if they already had
> >> to do an RMW for some other instructions and so they piggybacked on
> >> that.  I don't know of any MIPS instructions that would require RMW
> >> though (particularly since MIPS uses LL/SC for atomic ops, which is
> >> the main reason you'd want RMW).
> >>
> >> Have we discussed doing 3-byte memory ops on the list before?  It's
> >> clear it would require some new functionality at the read()/write()
> >> ExecContext interface, but that seems like a worthwhile addition to
> >> me.
> >>
> >> Steve
> >>
> >> >
> >> > On Tue, Jan 19, 2010 at 3:08 PM, Steve Reinhardt <[email protected]>
> >> wrote:
> >> >> My guess is that the instruction definition should be rewritten to do
> >> >> a store of the correct number of bytes and not a read-modify-write.
>  I
> >> >> don't think there's a reason the store function shouldn't handle a
> >> >> 3-byte store; somebody correct me if I'm wrong.
> >> >>
> >> >> Steve
> >> >>
> >> >> On Tue, Jan 19, 2010 at 2:38 PM, Matt DeVuyst <[email protected]>
> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> I'm getting an an assertion failure when executing MIPS code
> >> >>> in the o3 cpu model.  The same program executes correctly in the
> >> >>> atomic cpu model.  The assertion that is failing is
> >> >>> src/cpu/o3/lsq_unit.hh line 489.  It appears that the MIPS
> >> >>> instructions SWL and SWR don't work in o3.
> >> >>>
> >> >>> Here is the relevant portion of the backtrace:
> >> >>>
> >> >>> #3  0xb7a9e5ce in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> >> >>>
> >> >>> #4  0x08400914 in LSQUnit<O3CPUImpl>::read<unsigned int>
> >> >>>  (this=0xa695f68, req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at
> >> >>>  build-smt03/build/MIPS_SE/cpu/o3/lsq_unit.hh:489
> >> >>>
> >> >>> #5  0x08403745 in LSQ<O3CPUImpl>::read<unsigned int>
> (this=0xa695f40,
> >> >>>  req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at
> >> >>>  build-smt03/build/MIPS_SE/cpu/o3/lsq.hh:376
> >> >>>
> >> >>> #6  0x0840378d in FullO3CPU<O3CPUImpl>::read<unsigned int>
> >> >>>  (this=0xa694bc0, r...@0xbffbbad4, da...@0xbffbbb3c, load_idx=8) at
> >> >>>  build-smt03/build/MIPS_SE/cpu/o3/cpu.hh:708
> >> >>>
> >> >>> #7  0x0840399a in BaseDynInst<O3CPUImpl>::read<unsigned int>
> >> >>>  (this=0xa7dd9a0, addr=5068452, da...@0xbffbbb3c, flags=0) at
> >> >>>  build-smt03/build/MIPS_SE/cpu/base_dyn_inst.hh:884
> >> >>>
> >> >>> #8  0x083bb699 in MipsISAInst::Swr::initiateAcc (this=0xa8d0428,
> >> >>>  xc=0xa7dd9a0, traceData=0x0) at
> >> >>>  build-smt03/build/MIPS_SE/arch/mips/o3_cpu_exec.cc:25017
> >> >>>
> >> >>> #9  0x08228a27 in BaseO3DynInst<O3CPUImpl>::initiateAcc
> >> >>>  (this=0xa7dd9a0) at
> >> >>>  build-smt03/build/MIPS_SE/cpu/o3/dyn_inst_impl.hh:111
> >> >>>
> >> >>> #10 0x0837b69f in LSQUnit<O3CPUImpl>::executeStore (this=0xa695f68,
> >> >>> #store_in...@0xbffbc1f0) at
> >> >>> #build-smt03/build/MIPS_SE/cpu/o3/lsq_unit_impl.hh:495
> >> >>>
> >> >>> #11 0x0836747d in LSQ<O3CPUImpl>::executeStore (this=0xa695f40,
> >> >>>  in...@0xbffbc1f0) at
> build-smt03/build/MIPS_SE/cpu/o3/lsq_impl.hh:327
> >> >>>
> >> >>> #12 0x0828c0fa in DefaultIEW<O3CPUImpl>::executeInsts
> (this=0xa6956a0)
> >> >>> #at build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1232
> >> >>>
> >> >>> #13 0x0829436f in DefaultIEW<O3CPUImpl>::tick (this=0xa6956a0) at
> >> >>> #build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1458
> >> >>>
> >> >>> #14 0x081b5ae4 in FullO3CPU<O3CPUImpl>::tick (this=0xa694bc0) at
> >> >>> #build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:514
> >> >>>
> >> >>> #15 0x081b7254 in FullO3CPU<O3CPUImpl>::TickEvent::process
> >> >>> #(this=0xa694d0c) at build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:86
> >> >>>
> >> >>> #16 0x0822feda in EventQueue::serviceOne (this=0x8834008) at
> >> >>> #build-smt03/build/MIPS_SE/sim/eventq.cc:202
> >> >>>
> >> >>> #17 0x085c238c in simulate (num_cycles=9223372036854775807) at
> >> >>>  build-smt03/build/MIPS_SE/sim/simulate.cc:73
> >> >>>
> >> >>> So the MIPS instruction is an SWR instruction that stores anywhere
> >> >>> from 1 to 4 bytes into an aligned memory word.  The format function
> >> >>> that handles this instruction is StoreUnalignedMemory (in
> >> >>> src/arch/mips/isa/formats/mem.isa).  This specifies that the read()
> >> >>> function of the dynamic instruction (BaseDynInst<O3CPUImpl>::read())
> >> >>> be called to get the whole aligned word from memory.  Then the
> >> >>> appropriate bytes can be changed and the whole word written back to
> >> >>> memory.
> >> >>>
> >> >>> The problem seems to be that this store instructions (and other
> >> >>> unaligned instructions like it) read from memory as well as write
> >> >>> memory, but only a single StoreQueue (and not LoadQueue) LSQ entry
> was
> >> >>> created for it.
> >> >>>
> >> >>> Should an entry be made on the LoadQueue as well?  Or should a
> >> >>> separate lightweight memory read function be called instead?  It
> looks
> >> >>> like the latter may have been what was done in the past (if this
> >> >>> instruction ever used to work correctly)...because there's a
> >> >>> commented out line in the format definition of StoreUnalignedMemory
> >> >>> that calls xc->readFunctional().
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Cheers,
> >> >>> Matt
> >> >>> _______________________________________________
> >> >>> m5-dev mailing list
> >> >>> [email protected]
> >> >>> http://m5sim.org/mailman/listinfo/m5-dev
> >> >>>
> >> >> _______________________________________________
> >> >> m5-dev mailing list
> >> >> [email protected]
> >> >> http://m5sim.org/mailman/listinfo/m5-dev
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Cheers,
> >> > Matt
> >> > _______________________________________________
> >> > m5-dev mailing list
> >> > [email protected]
> >> > http://m5sim.org/mailman/listinfo/m5-dev
> >> >
> >> _______________________________________________
> >> m5-dev mailing list
> >> [email protected]
> >> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> >
> >
> >
> > --
> > - Korey
> >
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
- Korey
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to