[PATCH 2.6.14] mm: 8xx MM fix for

Joakim Tjernlund Mon, 7 Nov 2005 19:14:15 +0100

> -----Original Message-----
> From: Tom Rini [mailto:trini at kernel.crashing.org] 
> Sent: 07 November 2005 16:52
> To: Marcelo Tosatti
> Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; 
> linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com
> Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for
> 
> On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote:
> > Joakim!
> > 
> > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote:
> > > Hi Marcelo
> > > 
> > > [SNIP] 
> > > > The root of the problem are the changes against the 8xx TLB 
> > > > handlers introduced
> > > > during v2.6. What happens is the TLBMiss handlers load the 
> > > > zeroed pte into
> > > > the TLB, causing the TLBError handler to be invoked (thats 
> > > > two TLB faults per 
> > > > pagefault), which then jumps to the generic MM code to 
> setup the pte.
> > > > 
> > > > The bug is that the zeroed TLB is not invalidated (the 
> same reason
> > > > for the "dcbst" misbehaviour), resulting in infinite 
> TLBError faults.
> > > > 
> > > > Dan, I wonder why we just don't go back to v2.4 behaviour.
> > > 
> > > This is one reason why it is the way it is:
> > > 
> http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html
> > > This details are little fuzzy ATM, but I think the reason for the
> > > current
> > > impl. was only that it was less intrusive to impl.
> > 
> > Ah, I see. I wonder if the bug is processor specific: we 
> don't have such
> > changes in our v2.4 tree and never experienced such problem.
> > 
> > It should be pretty easy to hit it right? (instruction 
> pagefaults should
> > fail).
> > 
> > Grigori, Tom, can you enlight us about the issue on the URL 
> above. How
> > can it be triggered?
> 
> So after looking at the code in 2.6.14 and current git, I think the
> above URL isn't relevant, unless there was a change I missed (which
> could totally be possible) that reverted the patch there and 
> fixed that
> issue in a different manner.  But since I didn't figure that 
> out until I
> had finished researching it again:


I wasn't clear enough. What I meant was that the above patch made me
think and
the result was that I came up with a simpler fix, the "two exception"
fix that
is in current kernels. See
http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/head_8xx.S@
1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/kernel|hist
/arch/ppc/kernel/head_8xx.S
It appears this fix has some other issues :(

How do the other ppc arches do? I am guessing that they don't double
fault, but bails
out to do_page_fault from the TLB Miss handler, like 8xx used to do.

> 
> Switching hats for a minute, this came from a bug a customer of
> MontaVista found, so I can't give out the testcase :(
> 
> To repeat what Joakim said back then:
> "I think I have figured this out. The first TLB misses that happen at
> app startup is Data TLB misses. These will then hit the NULL L1 entry
> and end up in do_page_fault() which will populate the L1 
> entry. But when
> you have a very large app that spans more than one L1 entry (16 MB I
> think) it may happen that you will have I-TLB Miss first one of the L1
> entrys which will make the I-TLB handler bail out to 
> do_page_fault() and
> the app craches(SEGV)."

This still stands I think.

> 
> Looking at the patch again, what I don't see is why I talk 
> about fudging
> I-TLB Miss at 0x400 when it's I-TLB Error we fudge at being there, but
> then get hung up that there can be a slight diff between the 
> two ("This
> is because we check bit 4 of SRR1 in both cases, but in the case of an
> I-TLB Miss, this bit is always set, and it only indicates a protection
> fault on an I-TLB Error.") so instead of 0x1300 jumping to the handler
> at 0x400, we treat it like a regular exception so we know 
> where we came
> from, and perhaps missed fixing a case somewhere?

Didn't look into this part of your patch, sorry.

 Jocke

[PATCH 2.6.14] mm: 8xx MM fix for

Reply via email to