Thanks Malek. Very interesting. Yes, this 5 line changeset seems rather benign, but actually has huge ramifications. With this change, the RubyPort passes the correct block size to the cpu/device models. Without it, I believe the block size defaults to 0 or 1...I can't remember which. While that seems rather inconsequential, I noticed when I made this change that the memtester behaved quite differently. In particular, it keeps issuing requests until sendTiming returns false, instead of just one request/cpu at a time. Therefore another patch in this series added the retry mechanism to the RubyPort. I'm still not sure exactly what the problem is with ruby+dma, but I suspect that the dma devices are behaving differently now that the RubyPort passes the correct block size.
I was able to spend a few hours on this over the weekend. I am now able to reproduce the error and I have a few protocol bug fixes queued up. However, I don't think those fixes actually solved the main issue. I don't think I'll be able to get to it today, but I'll try to find some time tomorrow to investigate further. Brad > -----Original Message----- > From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] > On Behalf Of Korey Sewell > Sent: Monday, March 14, 2011 2:10 AM > To: M5 Developer List > Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? > > Which lines are you commenting out to get it to work? It's a bit unclear in > the > diff you point to (maybe because you said it's a full set of changes, not just > one) > > (btw: The work I've been doing is comparing the "old m5" memory trace to > the "gem5" memory trace to try to chase down the bug. I wouldn't be > surprised if we are converging to the same bug though.) > > On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh > <malek.mus...@gmail.com> wrote: > > Hi Brad, > > > > I found the problem that was causing this error. Specifically, it is > > this changeset: > > > > changeset: 7909:eee578ed2130 > > user: Joel Hestness <hestn...@cs.utexas.edu> > > date: Sun Feb 06 22:14:18 2011 -0800 > > summary: Ruby: Fix to return cache block size to CPU for split > > data transfers > > > > Link: http://reviews.m5sim.org/r/393/diff/#index_header > > > > Previously, I mentioned it was a couple of changesets prior to this > > one, but the changes between them are related, so it wasn't as obvious > > what was happening. > > > > In fact, this corresponds to the assert() for the block size you had > > put in to deal with x86 unaligned accesses, but then later removed > > because of LL/SC in Alpha. > > > > It's not clear to me why this is causing a problem, or rather why this > > doesn't return the default 64 byte block size from the ruby system, > > but commenting out those lines of code allowed it to work. > > > > Maybe Korey could confirm? > > > > Malek > > > > On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad > <brad.beckm...@amd.com> wrote: > >> I still have not been able to reproduce the problem, but I haven't tried > >> in a > few weeks. So does this happen when booting up the system, independent > of what benchmark you are running? If so, could you send me your > command line? I'm sure the disk image and kernel binaries between us are > different, so I don't necessarily think I'll be able to reproduce your > problem, > but at least I'll be able to isolate it. > >> > >> Brad > >> > >> > >> > >>> -----Original Message----- > >>> From: m5-dev-boun...@m5sim.org [mailto:m5-dev- > boun...@m5sim.org] On > >>> Behalf Of Malek Musleh > >>> Sent: Wednesday, March 09, 2011 4:41 PM > >>> To: M5 Developer List > >>> Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? > >>> > >>> Hi Korey, > >>> > >>> I ran into a similar problem with a different benchmark/boot up attempt. > >>> There is another thread on m5-dev with 'Ruby FS failing with recent > >>> changesets' as the subject. I was able to track down the changeset > >>> which it was coming from, but I did not look further into the > >>> changeset as to why it was causing it. > >>> > >>> Brad said he would take a look at it, but I am not sure if he was > >>> able to reproduce the problem. > >>> > >>> Malek > >>> > >>> On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell <ksew...@umich.edu> > wrote: > >>> > Hi all, > >>> > I'm trying to run Ruby in FS mode for the FFT benchmark. > >>> > > >>> > However, I've been unable to fully boot the kernel and error with > >>> > a panic in the IDE disk controller: > >>> > panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 > >>> > @ cycle 62640732569001 > >>> > > [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, > >>> > line 323] > >>> > > >>> > Has anybody run into a similar error or does anyone have any > >>> > suggestions for debugging the problem? I can run the same code > >>> > using the M5 memory system and FFT finishes properly so it's > >>> > definitely a ruby-specific thing. It seems to track this down , I > >>> > could diff instruction traces (M5 v. Ruby) or maybe even diff > >>> > trace output from the IdeDisk trace flags but those routes seem a > >>> > bit heavy-handed > >>> considering the amount of trace output generated. > >>> > > >>> > The command line this was run with is: > >>> > build/ALPHA_FS_MOESI_CMP_directory/m5.opt > >>> configs/example/ruby_fs.py > >>> > -b fft_64t_base -n 1 > >>> > > >>> > The output in system.terminal is: > >>> > hda: M5 IDE Disk, ATA DISK drive > >>> > hdb: M5 IDE Disk, ATA DISK drive > >>> > hda: UDMA/33 mode selected > >>> > hdb: UDMA/33 mode selected > >>> > hdc: M5 IDE Disk, ATA DISK drive > >>> > hdc: UDMA/33 mode selected > >>> > ide0 at 0x8410-0x8417,0x8422 on irq 31 > >>> > ide1 at 0x8418-0x841f,0x8426 on irq 31 > >>> > ide_generic: please use "probe_mask=0x3f" module parameter for > >>> > probing all legacy ISA IDE ports > >>> > ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 > >>> > ide3 at 0x170-0x177,0x376 on irq 15 > >>> > hda: max request size: 128KiB > >>> > hda: 2866752 sectors (1467 MB), CHS=2844/16/63 > >>> > hda:<4>hda: dma_timer_expiry: dma status == 0x65 > >>> > hda: DMA interrupt recovery > >>> > hda: lost interrupt > >>> > unknown partition table > >>> > hdb: max request size: 128KiB > >>> > hdb: 1008000 sectors (516 MB), CHS=1000/16/63 > >>> > hdb:<4>hdb: dma_timer_expiry: dma status == 0x65 > >>> > hdb: DMA interrupt recovery > >>> > hdb: lost interrupt > >>> > > >>> > Thanks again, any help or thoughts would be well appreciated. > >>> > > >>> > -- > >>> > - Korey > >>> > _______________________________________________ > >>> > m5-dev mailing list > >>> > m5-dev@m5sim.org > >>> > http://m5sim.org/mailman/listinfo/m5-dev > >>> > > >>> _______________________________________________ > >>> m5-dev mailing list > >>> m5-dev@m5sim.org > >>> http://m5sim.org/mailman/listinfo/m5-dev > >> > >> > >> _______________________________________________ > >> m5-dev mailing list > >> m5-dev@m5sim.org > >> http://m5sim.org/mailman/listinfo/m5-dev > >> > > _______________________________________________ > > m5-dev mailing list > > m5-dev@m5sim.org > > http://m5sim.org/mailman/listinfo/m5-dev > > > > > > -- > - Korey > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev