Hi Matt. Sorry no one has taken a look at this yet, at least that I'm aware of. We haven't forgotten you, but it might take a little while before we can get this taken care of.
Gabe Matt DeVuyst wrote: > I fixed the problem. See the attached patch. > > For the case of writing 3 bytes to memory, first I tried using a > simple 3-byte array, but that didn't work quite right. Then I wrapped > the 3-byte array in a struct and added some operator overloads to it > so that things like assignment to 0 will work as expected. > > Unfortunately some of the old format definitions (like StoreExecute) > in mem.isa were unsuitable for the unaligned stores because they > assumed a fixed size; the type to be written was explicitly cast in > the call to xc->write(). With the unaligned accesses, the number of > bytes to write will vary at runtime and depend on the byte offset > within the memory word. Thus, I created some new format definitions > (like StoreUnalignedExecute) that switch on the size of the write to > make the appropriate call to xc->write(). > > All the write() functions were templated to accept writes of different > sizes, but none of them had declarations to handle 3-byte writes (and, > as you probably know, the C++ templating system requires this if a > templated function is used in a different file that is compiled > separately), so I ended up having to add new declarations to write() > everywhere to handle the 3-byte writes. > > I've tested the changes and they appear to work. But I've only tested > on little-endian host machines (x86 and x86_64) since I don't have a > big-endian machine, and only tested MIPS in its default little-endian > mode since I don't have a cross-compiler to produce MIPS big-endian > binaries. While I tried to code carefully to handle these other > endian configurations, I cannot guarantee that they will work without > some slight modifications. > > Anyway, thanks for all your help and advice. > > On Tue, Jan 19, 2010 at 7:03 PM, Korey Sewell <[email protected]> wrote: > >> Good points about potential padding. >> >> The structure could be just used to obtain the width, where as the >> assignment of data could be just be in a void* or whatever arbitrary type as >> I'm guessing that once the request is made, it will simply read/write as >> many bytes as the width says it should. >> >> Matt would you be willing to give one of these a try in the interim? >> >> On Tue, Jan 19, 2010 at 9:48 PM, Gabriel Michael Black >> <[email protected]> wrote: >> >>> A structure might not work because of padding (I'm just guessing >>> though), but what about uint8_t [3]? That might get wierd if something >>> tries to dereference a pointer to that type and do a direct >>> assignment, something like: >>> >>> template <class T> >>> bar(T *foo) >>> { >>> ... >>> T *bar = (T *)baz; >>> *bar = *foo; >>> } >>> >>> But that might work too. I'm not sure. >>> >>> Gabe >>> >>> Quoting Korey Sewell <[email protected]>: >>> >>> >>>> Hi Matt/Steve, >>>> sorry I'm late to the conversation. >>>> >>>> In short, the reason it's implemented currently as a read-modify-write >>>> was >>>> because what Matt found out to be true: the 3-byte store problem. At the >>>> time of implementation, M5 was at the crossroads of another memory >>>> system >>>> change (remember mergedmem anyone?) and there wasnt any really good, >>>> unobtrusive solution to integrate this in. >>>> >>>> Like Steve mentioned, the ISA/CPU model "dance" was just not working out >>>> at >>>> the time but getting them to work in the AtomicSimpleCPU was at least a >>>> good >>>> functionality start. I had a "readFunctional" there which was able to >>>> read >>>> from memory without adversely affecting the LSQ and other dependent >>>> structures. Reading the data out first before storing would allow you to >>>> get >>>> the right data and *hopefully not upset the timing too much depending on >>>> what you're doing. Performing a regular read there will definitely not >>>> work >>>> there though as Matt found out. >>>> >>>> I believe the only addition that we previously made was to add a >>>> "unaligned" >>>> flag or something of that effect to the request structure so that we >>>> wouldnt >>>> fault when the swl/swr pairs were used in MIPS. >>>> >>>> I dont know why this never came up then (or maybe it did), but a >>>> proposed >>>> solution might be to create a new struct that is exactly 3 bytes long >>>> (say >>>> an uint16_t and a uint8_t), then when a 3-byte write is necessary, we >>>> can >>>> use that structure to generate the right width for the request. >>>> >>>> Thoughts? >>>> >>>> On Tue, Jan 19, 2010 at 8:31 PM, Steve Reinhardt <[email protected]> >>>> wrote: >>>> >>>> >>>>> On Tue, Jan 19, 2010 at 5:13 PM, Matt DeVuyst <[email protected]> >>>>> wrote: >>>>> >>>>>> I don't see how 3 byte stores will work. The write() functions don't >>>>>> take a size argument. They just template the data argument (and can >>>>>> figure out the size by with sizeof(T)). So, for example, if you want >>>>>> to write 2 bytes to memory you just call write((uint16_t)data, >>>>>> address, flags, NULL). But I'm not aware of any data type that's 3 >>>>>> bytes wide. Is changing all the write() functions to accept a size >>>>>> argument the best thing to do here? >>>>>> >>>>> Good point... I hadn't thought that through. I think the memory >>>>> system at the lower level will handle this (i.e., I believe you can >>>>> put together a Request object that has size=3 and send that to a >>>>> memory object and it will work; it's really just the interface between >>>>> the ISA and the CPU model that gets in the way. >>>>> >>>>> >>>>>> Also, I'm trying to picture how these instructions would be >>>>>> implemented in a real MIPS processor. Would you really be able to >>>>>> write 3 bytes to memory or would you have to read a full word first >>>>>> and then modify the appropriate bytes and write back? >>>>>> >>>>> In hardware it's pretty easy to do per-byte enables if you need to; >>>>> much simpler than doing a read-modify-write sequence. The only reason >>>>> I can imagine that they wouldn't do it that way is if they already had >>>>> to do an RMW for some other instructions and so they piggybacked on >>>>> that. I don't know of any MIPS instructions that would require RMW >>>>> though (particularly since MIPS uses LL/SC for atomic ops, which is >>>>> the main reason you'd want RMW). >>>>> >>>>> Have we discussed doing 3-byte memory ops on the list before? It's >>>>> clear it would require some new functionality at the read()/write() >>>>> ExecContext interface, but that seems like a worthwhile addition to >>>>> me. >>>>> >>>>> Steve >>>>> >>>>> >>>>>> On Tue, Jan 19, 2010 at 3:08 PM, Steve Reinhardt <[email protected]> >>>>>> >>>>> wrote: >>>>> >>>>>>> My guess is that the instruction definition should be rewritten to >>>>>>> do >>>>>>> a store of the correct number of bytes and not a read-modify-write. >>>>>>> I >>>>>>> don't think there's a reason the store function shouldn't handle a >>>>>>> 3-byte store; somebody correct me if I'm wrong. >>>>>>> >>>>>>> Steve >>>>>>> >>>>>>> On Tue, Jan 19, 2010 at 2:38 PM, Matt DeVuyst <[email protected]> >>>>>>> >>>>> wrote: >>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm getting an an assertion failure when executing MIPS code >>>>>>>> in the o3 cpu model. The same program executes correctly in the >>>>>>>> atomic cpu model. The assertion that is failing is >>>>>>>> src/cpu/o3/lsq_unit.hh line 489. It appears that the MIPS >>>>>>>> instructions SWL and SWR don't work in o3. >>>>>>>> >>>>>>>> Here is the relevant portion of the backtrace: >>>>>>>> >>>>>>>> #3 0xb7a9e5ce in __assert_fail () from >>>>>>>> /lib/tls/i686/cmov/libc.so.6 >>>>>>>> >>>>>>>> #4 0x08400914 in LSQUnit<O3CPUImpl>::read<unsigned int> >>>>>>>> (this=0xa695f68, req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/o3/lsq_unit.hh:489 >>>>>>>> >>>>>>>> #5 0x08403745 in LSQ<O3CPUImpl>::read<unsigned int> >>>>>>>> (this=0xa695f40, >>>>>>>> req=0xa7a3a98, da...@0xbffbbb3c, load_idx=8) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/o3/lsq.hh:376 >>>>>>>> >>>>>>>> #6 0x0840378d in FullO3CPU<O3CPUImpl>::read<unsigned int> >>>>>>>> (this=0xa694bc0, r...@0xbffbbad4, da...@0xbffbbb3c, load_idx=8) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/o3/cpu.hh:708 >>>>>>>> >>>>>>>> #7 0x0840399a in BaseDynInst<O3CPUImpl>::read<unsigned int> >>>>>>>> (this=0xa7dd9a0, addr=5068452, da...@0xbffbbb3c, flags=0) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/base_dyn_inst.hh:884 >>>>>>>> >>>>>>>> #8 0x083bb699 in MipsISAInst::Swr::initiateAcc (this=0xa8d0428, >>>>>>>> xc=0xa7dd9a0, traceData=0x0) at >>>>>>>> build-smt03/build/MIPS_SE/arch/mips/o3_cpu_exec.cc:25017 >>>>>>>> >>>>>>>> #9 0x08228a27 in BaseO3DynInst<O3CPUImpl>::initiateAcc >>>>>>>> (this=0xa7dd9a0) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/o3/dyn_inst_impl.hh:111 >>>>>>>> >>>>>>>> #10 0x0837b69f in LSQUnit<O3CPUImpl>::executeStore (this=0xa695f68, >>>>>>>> #store_in...@0xbffbc1f0) at >>>>>>>> #build-smt03/build/MIPS_SE/cpu/o3/lsq_unit_impl.hh:495 >>>>>>>> >>>>>>>> #11 0x0836747d in LSQ<O3CPUImpl>::executeStore (this=0xa695f40, >>>>>>>> in...@0xbffbc1f0) at >>>>>>>> build-smt03/build/MIPS_SE/cpu/o3/lsq_impl.hh:327 >>>>>>>> >>>>>>>> #12 0x0828c0fa in DefaultIEW<O3CPUImpl>::executeInsts >>>>>>>> (this=0xa6956a0) >>>>>>>> #at build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1232 >>>>>>>> >>>>>>>> #13 0x0829436f in DefaultIEW<O3CPUImpl>::tick (this=0xa6956a0) at >>>>>>>> #build-smt03/build/MIPS_SE/cpu/o3/iew_impl.hh:1458 >>>>>>>> >>>>>>>> #14 0x081b5ae4 in FullO3CPU<O3CPUImpl>::tick (this=0xa694bc0) at >>>>>>>> #build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:514 >>>>>>>> >>>>>>>> #15 0x081b7254 in FullO3CPU<O3CPUImpl>::TickEvent::process >>>>>>>> #(this=0xa694d0c) at build-smt03/build/MIPS_SE/cpu/o3/cpu.cc:86 >>>>>>>> >>>>>>>> #16 0x0822feda in EventQueue::serviceOne (this=0x8834008) at >>>>>>>> #build-smt03/build/MIPS_SE/sim/eventq.cc:202 >>>>>>>> >>>>>>>> #17 0x085c238c in simulate (num_cycles=9223372036854775807) at >>>>>>>> build-smt03/build/MIPS_SE/sim/simulate.cc:73 >>>>>>>> >>>>>>>> So the MIPS instruction is an SWR instruction that stores anywhere >>>>>>>> from 1 to 4 bytes into an aligned memory word. The format function >>>>>>>> that handles this instruction is StoreUnalignedMemory (in >>>>>>>> src/arch/mips/isa/formats/mem.isa). This specifies that the read() >>>>>>>> function of the dynamic instruction >>>>>>>> (BaseDynInst<O3CPUImpl>::read()) >>>>>>>> be called to get the whole aligned word from memory. Then the >>>>>>>> appropriate bytes can be changed and the whole word written back to >>>>>>>> memory. >>>>>>>> >>>>>>>> The problem seems to be that this store instructions (and other >>>>>>>> unaligned instructions like it) read from memory as well as write >>>>>>>> memory, but only a single StoreQueue (and not LoadQueue) LSQ entry >>>>>>>> was >>>>>>>> created for it. >>>>>>>> >>>>>>>> Should an entry be made on the LoadQueue as well? Or should a >>>>>>>> separate lightweight memory read function be called instead? It >>>>>>>> looks >>>>>>>> like the latter may have been what was done in the past (if this >>>>>>>> instruction ever used to work correctly)...because there's a >>>>>>>> commented out line in the format definition of StoreUnalignedMemory >>>>>>>> that calls xc->readFunctional(). >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Cheers, >>>>>>>> Matt >>>>>>>> _______________________________________________ >>>>>>>> m5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> m5-dev mailing list >>>>>>> [email protected] >>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> Matt >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>> >>>> -- >>>> - Korey >>>> >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> -- >> - Korey >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> >> > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
