Good points about potential padding. The structure could be just used to obtain the width, where as the assignment of data could be just be in a void* or whatever arbitrary type as I'm guessing that once the request is made, it will simply read/write as many bytes as the width says it should.
Matt would you be willing to give one of these a try in the interim? On Tue, Jan 19, 2010 at 9:48 PM, Gabriel Michael Black < [email protected]> wrote: > A structure might not work because of padding (I'm just guessing > though), but what about uint8_t [3]? That might get wierd if something > tries to dereference a pointer to that type and do a direct > assignment, something like: > > template <class T> > bar(T *foo) > { > ... > T *bar = (T *)baz; > *bar = *foo; > } > > But that might work too. I'm not sure. > > Gabe > > Quoting Korey Sewell <[email protected]>: > > > Hi Matt/Steve, > > sorry I'm late to the conversation. > > > > In short, the reason it's implemented currently as a read-modify-write > was > > because what Matt found out to be true: the 3-byte store problem. At the > > time of implementation, M5 was at the crossroads of another memory system > > change (remember mergedmem anyone?) and there wasnt any really good, > > unobtrusive solution to integrate this in. > > > > Like Steve mentioned, the ISA/CPU model "dance" was just not working out > at > > the time but getting them to work in the AtomicSimpleCPU was at least a > good > > functionality start. I had a "readFunctional" there which was able to > read > > from memory without adversely affecting the LSQ and other dependent > > structures. Reading the data out first before storing would allow you to > get > > the right data and *hopefully not upset the timing too much depending on > > what you're doing. Performing a regular read there will definitely not > work > > there though as Matt found out. > > > > I believe the only addition that we previously made was to add a > "unaligned" > > flag or something of that effect to the request structure so that we > wouldnt > > fault when the swl/swr pairs were used in MIPS. > > > > I dont know why this never came up then (or maybe it did), but a proposed > > solution might be to create a new struct that is exactly 3 bytes long > (say > > an uint16_t and a uint8_t), then when a 3-byte write is necessary, we can > > use that structure to generate the right width for the request. > > > > Thoughts? > > > > On Tue, Jan 19, 2010 at 8:31 PM, Steve Reinhardt <[email protected]> > wrote: > > > >> On Tue, Jan 19, 2010 at 5:13 PM, Matt DeVuyst <[email protected]> > wrote: > >> > I don't see how 3 byte stores will work. The write() functions don't > >> > take a size argument. They just template the data argument (and can > >> > figure out the size by with sizeof(T)). So, for example, if you want > >> > to write 2 bytes to memory you just call write((uint16_t)data, > >> > address, flags, NULL). But I'm not aware of any data type that's 3 > >> > bytes wide. Is changing all the write() functions to accept a size > >> > argument the best thing to do here? > >> > >> Good point... I hadn't thought that through. I think the memory > >> system at the lower level will handle this (i.e., I believe you can > >> put together a Request object that has size=3 and send that to a > >> memory object and it will work; it's really just the interface between > >> the ISA and the CPU model that gets in the way. > >> > >> > > >> > Also, I'm trying to picture how these instructions would be > >> > implemented in a real MIPS processor. Would you really be able to > >> > write 3 bytes to memory or would you have to read a full word first > >> > and then modify the appropriate bytes and write back? > >> > >> In hardware it's pretty easy to do per-byte enables if you need to; > >> much simpler than doing a read-modify-write sequence. The only reason > >> I can imagine that they wouldn't do it that way is if they already had > >> to do an RMW for some other instructions and so they piggybacked on > >> that. I don't know of any MIPS instructions that would require RMW > >> though (particularly since MIPS uses LL/SC for atomic ops, which is > >> the main reason you'd want RMW). > >> > >> Have we discussed doing 3-byte memory ops on the list before? It's > >> clear it would require some new functionality at the read()/write() > >> ExecContext interface, but that seems like a worthwhile addition to > >> me. > >> > >> Steve > >> > >> > > >> > On Tue, Jan 19, 2010 at 3:08 PM, Steve Reinhardt <[email protected]> > >> wrote: > >> >> My guess is that the instruction definition should be rewritten to do > >> >> a store of the correct number of bytes and not a read-modify-write. > I > >> >> don't think there's a reason the store function shouldn't handle a > >> >> 3-byte store; somebody correct me if I'm wrong. > >> >> > >> >> Steve > >> >> > >> >> On Tue, Jan 19, 2010 at 2:38 PM, Matt DeVuyst <[email protected]> > >> wrote: > >> >>> Hi, > >> >>> > >> >>> I'm getting an an assertion failure when executing MIPS code > >> >>> in the o3 cpu model. The same program executes correctly in the > >> >>> atomic cpu model. The assertion that is failing is > >> >>> src/cpu/o3/lsq_unit.hh line 489. It appears that the MIPS > >> >>> instructions SWL and SWR don't work in o3. > >> >>> > >> >>> Here is the relevant portion of the backtrace: > >> >>> > >> >>> #3 0xb7a9e5ce in __assert_fail () from /lib/tls/i686/cmov/libc.so.6 > >> >>> > >> >>> #4 0x08400914 in LSQUnit<O3CPUImpl>::read<unsigned int> > >> >>> (this=0xa695f68, req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at > >> >>> build-smt03/build/MIPS_SE/cpu/o3/lsq_unit.hh:489 > >> >>> > >> >>> #5 0x08403745 in LSQ<O3CPUImpl>::read<unsigned int> > (this=0xa695f40, > >> >>> req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at > >> >>> build-smt03/build/MIPS_SE/cpu/o3/lsq.hh:376 > >> >>> > >> >>> #6 0x0840378d in FullO3CPU<O3CPUImpl>::read<unsigned int> > >> >>> (this=0xa694bc0, r...@0xbffbbad4, da...@0xbffbbb3c, load_idx=8) at > >> >>> build-smt03/build/MIPS_SE/cpu/o3/cpu.hh:708 > >> >>> > >> >>> #7 0x0840399a in BaseDynInst<O3CPUImpl>::read<unsigned int> > >> >>> (this=0xa7dd9a0, addr=5068452, da...@0xbffbbb3c, flags=0) at > >> >>> build-smt03/build/MIPS_SE/cpu/base_dyn_inst.hh:884 > >> >>> > >> >>> #8 0x083bb699 in MipsISAInst::Swr::initiateAcc (this=0xa8d0428, > >> >>> xc=0xa7dd9a0, traceData=0x0) at > >> >>> build-smt03/build/MIPS_SE/arch/mips/o3_cpu_exec.cc:25017 > >> >>> > >> >>> #9 0x08228a27 in BaseO3DynInst<O3CPUImpl>::initiateAcc > >> >>> (this=0xa7dd9a0) at > >> >>> build-smt03/build/MIPS_SE/cpu/o3/dyn_inst_impl.hh:111 > >> >>> > >> >>> #10 0x0837b69f in LSQUnit<O3CPUImpl>::executeStore (this=0xa695f68, > >> >>> #store_in...@0xbffbc1f0) at > >> >>> #build-smt03/build/MIPS_SE/cpu/o3/lsq_unit_impl.hh:495 > >> >>> > >> >>> #11 0x0836747d in LSQ<O3CPUImpl>::executeStore (this=0xa695f40, > >> >>> in...@0xbffbc1f0) at > build-smt03/build/MIPS_SE/cpu/o3/lsq_impl.hh:327 > >> >>> > >> >>> #12 0x0828c0fa in DefaultIEW<O3CPUImpl>::executeInsts > (this=0xa6956a0) > >> >>> #at build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1232 > >> >>> > >> >>> #13 0x0829436f in DefaultIEW<O3CPUImpl>::tick (this=0xa6956a0) at > >> >>> #build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1458 > >> >>> > >> >>> #14 0x081b5ae4 in FullO3CPU<O3CPUImpl>::tick (this=0xa694bc0) at > >> >>> #build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:514 > >> >>> > >> >>> #15 0x081b7254 in FullO3CPU<O3CPUImpl>::TickEvent::process > >> >>> #(this=0xa694d0c) at build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:86 > >> >>> > >> >>> #16 0x0822feda in EventQueue::serviceOne (this=0x8834008) at > >> >>> #build-smt03/build/MIPS_SE/sim/eventq.cc:202 > >> >>> > >> >>> #17 0x085c238c in simulate (num_cycles=9223372036854775807) at > >> >>> build-smt03/build/MIPS_SE/sim/simulate.cc:73 > >> >>> > >> >>> So the MIPS instruction is an SWR instruction that stores anywhere > >> >>> from 1 to 4 bytes into an aligned memory word. The format function > >> >>> that handles this instruction is StoreUnalignedMemory (in > >> >>> src/arch/mips/isa/formats/mem.isa). This specifies that the read() > >> >>> function of the dynamic instruction (BaseDynInst<O3CPUImpl>::read()) > >> >>> be called to get the whole aligned word from memory. Then the > >> >>> appropriate bytes can be changed and the whole word written back to > >> >>> memory. > >> >>> > >> >>> The problem seems to be that this store instructions (and other > >> >>> unaligned instructions like it) read from memory as well as write > >> >>> memory, but only a single StoreQueue (and not LoadQueue) LSQ entry > was > >> >>> created for it. > >> >>> > >> >>> Should an entry be made on the LoadQueue as well? Or should a > >> >>> separate lightweight memory read function be called instead? It > looks > >> >>> like the latter may have been what was done in the past (if this > >> >>> instruction ever used to work correctly)...because there's a > >> >>> commented out line in the format definition of StoreUnalignedMemory > >> >>> that calls xc->readFunctional(). > >> >>> > >> >>> > >> >>> -- > >> >>> Cheers, > >> >>> Matt > >> >>> _______________________________________________ > >> >>> m5-dev mailing list > >> >>> [email protected] > >> >>> http://m5sim.org/mailman/listinfo/m5-dev > >> >>> > >> >> _______________________________________________ > >> >> m5-dev mailing list > >> >> [email protected] > >> >> http://m5sim.org/mailman/listinfo/m5-dev > >> >> > >> > > >> > > >> > > >> > -- > >> > Cheers, > >> > Matt > >> > _______________________________________________ > >> > m5-dev mailing list > >> > [email protected] > >> > http://m5sim.org/mailman/listinfo/m5-dev > >> > > >> _______________________________________________ > >> m5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/m5-dev > >> > > > > > > > > -- > > - Korey > > > > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > -- - Korey
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
