Thanks Malek.  Very interesting.

Yes, this 5 line changeset seems rather benign, but actually has huge 
ramifications.  With this change, the RubyPort passes the correct block size to 
the cpu/device models.  Without it, I believe the block size defaults to 0 or 
1...I can't remember which.  While that seems rather inconsequential, I noticed 
when I made this change that the memtester behaved quite differently.  In 
particular, it keeps issuing requests until sendTiming returns false, instead 
of just one request/cpu at a time.  Therefore another patch in this series 
added the retry mechanism to the RubyPort.  I'm still not sure exactly what the 
problem is with ruby+dma, but I suspect that the dma devices are behaving 
differently now that the RubyPort passes the correct block size.

I was able to spend a few hours on this over the weekend.  I am now able to 
reproduce the error and I have a few protocol bug fixes queued up.  However, I 
don't think those fixes actually solved the main issue.  I don't think I'll be 
able to get to it today, but I'll try to find some time tomorrow to investigate 
further.  

Brad


> -----Original Message-----
> From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
> On Behalf Of Korey Sewell
> Sent: Monday, March 14, 2011 2:10 AM
> To: M5 Developer List
> Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
> 
> Which lines are you commenting out to  get it to work? It's a bit unclear in 
> the
> diff you point to (maybe because you said it's a full set of changes, not just
> one)
> 
> (btw: The work I've been doing is comparing the "old m5" memory trace to
> the "gem5" memory trace to try to chase down the bug. I wouldn't be
> surprised if we are converging to the same bug though.)
> 
> On Mon, Mar 14, 2011 at 3:51 AM, Malek Musleh
> <malek.mus...@gmail.com> wrote:
> > Hi Brad,
> >
> > I found the problem that was causing this error. Specifically, it is
> > this changeset:
> >
> > changeset:   7909:eee578ed2130
> > user:        Joel Hestness <hestn...@cs.utexas.edu>
> > date:        Sun Feb 06 22:14:18 2011 -0800
> > summary:     Ruby: Fix to return cache block size to CPU for split
> > data transfers
> >
> > Link: http://reviews.m5sim.org/r/393/diff/#index_header
> >
> > Previously, I mentioned it was a couple of changesets prior to this
> > one, but the changes between them are related, so it wasn't as obvious
> > what was happening.
> >
> > In fact, this corresponds to the assert() for the block size you had
> > put in to deal with x86 unaligned accesses, but then later removed
> > because of LL/SC in Alpha.
> >
> > It's not clear to me why this is causing a problem, or rather why this
> > doesn't return the default 64 byte block size from the ruby system,
> > but commenting out those lines of code allowed it to work.
> >
> > Maybe Korey could confirm?
> >
> > Malek
> >
> > On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad
> <brad.beckm...@amd.com> wrote:
> >> I still have not been able to reproduce the problem, but I haven't tried 
> >> in a
> few weeks.  So does this happen when booting up the system, independent
> of what benchmark you are running?  If so, could you send me your
> command line?  I'm sure the disk image and kernel binaries between us are
> different, so I don't necessarily think I'll be able to reproduce your 
> problem,
> but at least I'll be able to isolate it.
> >>
> >> Brad
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
> boun...@m5sim.org] On
> >>> Behalf Of Malek Musleh
> >>> Sent: Wednesday, March 09, 2011 4:41 PM
> >>> To: M5 Developer List
> >>> Subject: Re: [m5-dev] Ruby FS - DMA Controller problem?
> >>>
> >>> Hi Korey,
> >>>
> >>> I ran into a similar problem with a different benchmark/boot up attempt.
> >>> There is another thread on m5-dev with 'Ruby FS failing with recent
> >>> changesets' as the subject. I was able to track down the changeset
> >>> which it was coming from, but I did not look further into the
> >>> changeset as to why it was causing it.
> >>>
> >>> Brad said he would take a look at it, but I am not sure if he was
> >>> able to reproduce the problem.
> >>>
> >>> Malek
> >>>
> >>> On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell <ksew...@umich.edu>
> wrote:
> >>> > Hi all,
> >>> > I'm trying to run Ruby in FS mode for the FFT benchmark.
> >>> >
> >>> > However, I've been unable to fully boot the kernel and error with
> >>> > a panic in the IDE disk controller:
> >>> > panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1
> >>> > @ cycle 62640732569001
> >>> >
> [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc,
> >>> > line 323]
> >>> >
> >>> > Has anybody run into a similar error or does anyone have any
> >>> > suggestions for debugging the problem? I can run the same code
> >>> > using the M5 memory system and FFT finishes properly so it's
> >>> > definitely a ruby-specific thing. It seems to track this down , I
> >>> > could diff instruction traces (M5 v. Ruby) or maybe even diff
> >>> > trace output from the IdeDisk trace flags but those routes seem a
> >>> > bit heavy-handed
> >>> considering the amount of trace output generated.
> >>> >
> >>> > The command line this was run with is:
> >>> > build/ALPHA_FS_MOESI_CMP_directory/m5.opt
> >>> configs/example/ruby_fs.py
> >>> > -b fft_64t_base -n 1
> >>> >
> >>> > The output in system.terminal is:
> >>> > hda: M5 IDE Disk, ATA DISK drive
> >>> > hdb: M5 IDE Disk, ATA DISK drive
> >>> > hda: UDMA/33 mode selected
> >>> > hdb: UDMA/33 mode selected
> >>> > hdc: M5 IDE Disk, ATA DISK drive
> >>> > hdc: UDMA/33 mode selected
> >>> > ide0 at 0x8410-0x8417,0x8422 on irq 31
> >>> > ide1 at 0x8418-0x841f,0x8426 on irq 31
> >>> > ide_generic: please use "probe_mask=0x3f" module parameter for
> >>> > probing all legacy ISA IDE ports
> >>> > ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
> >>> > ide3 at 0x170-0x177,0x376 on irq 15
> >>> > hda: max request size: 128KiB
> >>> > hda: 2866752 sectors (1467 MB), CHS=2844/16/63
> >>> >  hda:<4>hda: dma_timer_expiry: dma status == 0x65
> >>> > hda: DMA interrupt recovery
> >>> > hda: lost interrupt
> >>> >  unknown partition table
> >>> > hdb: max request size: 128KiB
> >>> > hdb: 1008000 sectors (516 MB), CHS=1000/16/63
> >>> >  hdb:<4>hdb: dma_timer_expiry: dma status == 0x65
> >>> > hdb: DMA interrupt recovery
> >>> > hdb: lost interrupt
> >>> >
> >>> > Thanks again, any help or thoughts would be well appreciated.
> >>> >
> >>> > --
> >>> > - Korey
> >>> > _______________________________________________
> >>> > m5-dev mailing list
> >>> > m5-dev@m5sim.org
> >>> > http://m5sim.org/mailman/listinfo/m5-dev
> >>> >
> >>> _______________________________________________
> >>> m5-dev mailing list
> >>> m5-dev@m5sim.org
> >>> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> >>
> >> _______________________________________________
> >> m5-dev mailing list
> >> m5-dev@m5sim.org
> >> http://m5sim.org/mailman/listinfo/m5-dev
> >>
> > _______________________________________________
> > m5-dev mailing list
> > m5-dev@m5sim.org
> > http://m5sim.org/mailman/listinfo/m5-dev
> >
> 
> 
> 
> --
> - Korey
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev


_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to