Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-06-16 Thread Thomas Schwinge
Hi!

On Mon, 28 Apr 2014 10:09:17 +0200, I wrote:
> On Sun, 27 Apr 2014 15:55:29 -0400, Jerome Glisse  wrote:
> > If my ugly patch works does this quirk also work ?
> 
> Unfortunately they both don't; see my other email,
> .

> [...] hacked around as follows: [...]

> If needed, I can try to capture more data, but someone who has knowledge
> of PCI bus architecture and Linux kernel code (so, not me), might
> probably already see what's wrong.

The problem "solved itself": the machine recently died of hardware
failure.  ;-|


Grüße,
 Thomas


pgpFNJvg4ln86.pgp
Description: PGP signature


Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-06-16 Thread Thomas Schwinge
Hi!

On Mon, 28 Apr 2014 10:09:17 +0200, I wrote:
 On Sun, 27 Apr 2014 15:55:29 -0400, Jerome Glisse j.gli...@gmail.com wrote:
  If my ugly patch works does this quirk also work ?
 
 Unfortunately they both don't; see my other email,
 http://news.gmane.org/find-root.php?message_id=%3C87sioxq3rx.fsf%40schwinge.name%3E.

 [...] hacked around as follows: [...]

 If needed, I can try to capture more data, but someone who has knowledge
 of PCI bus architecture and Linux kernel code (so, not me), might
 probably already see what's wrong.

The problem solved itself: the machine recently died of hardware
failure.  ;-|


Grüße,
 Thomas


pgpFNJvg4ln86.pgp
Description: PGP signature


RE: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Deucher, Alexander
> -Original Message-
> From: Deucher, Alexander
> Sent: Monday, April 28, 2014 8:50 AM
> To: Koenig, Christian; Jerome Glisse; Thomas Schwinge
> Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel Gorman;
> Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
> m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; dri-
> de...@lists.freedesktop.org
> Subject: RE: radeon: screen garbled after page allocator change, was: Re:
> [patch v2 3/3] mm: page_alloc: fair zone allocator policy
> 
> > -Original Message-
> > From: Koenig, Christian
> > Sent: Monday, April 28, 2014 3:30 AM
> > To: Jerome Glisse; Thomas Schwinge
> > Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel
> Gorman;
> > Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
> > m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; Deucher,
> > Alexander; dri-de...@lists.freedesktop.org
> > Subject: Re: radeon: screen garbled after page allocator change, was: Re:
> > [patch v2 3/3] mm: page_alloc: fair zone allocator policy
> >
> > > + /* We are living in a monstruous world in which you can have the pci
> > > +  * root complex behind an hypertransport link which can not address
> > > +  * anything above 32bit (well hypertransport specification says 40bits
> > > +  * but hardware such as SIS761 only support 32bits).
> > That looks more like a problem with this specific chipset rather than
> > something that needs a general solution like this.
> >
> > Maybe we should rather add the PCI-ID(s) of the thing to some kind of
> > quirks table for now so that the patch isn't so invasive and we can CC
> > stable as well?
> 
> IIRC, there was someone on IRC with a similar problem with a similar SiS
> chipset a while back.  These SiS chipsets seem to be generally problematic.

IIRC, in the IRC case, the fix was to limit the about of physical memory in the 
system.

Alex

> 
> Alex
> 
> >
> > Just a thought,
> > Christian.
> >
> > Am 27.04.2014 21:55, schrieb Jerome Glisse:
> > > On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
> > >> On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> > >>> Hi Thomas,
> > >>>
> > >>> On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > >>>> Hi!
> > >>>>
> > >>>> On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner
> >  wrote:
> > >>>>> Each zone that holds userspace pages of one workload must be
> aged
> > at a
> > >>>>> speed proportional to the zone size.  [...]
> > >>>>> Fix this with a very simple round robin allocator.  [...]
> > >>>> This patch, adding NR_ALLOC_BATCH, eventually landed in mainline
> as
> > >>>> commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > >>>>
> > >>>> I recently upgraded a Debian testing system from a 3.11 kernel to
> 3.12,
> > >>>> and it started to exhibit "strange" issues, which I then bisected to 
> > >>>> this
> > >>>> patch.  I'm not saying that the patch is faulty, as it seems to be
> > >>>> working fine for everyone else, so I rather assume that something in
> a
> > >>>> (vastly?) different corner of the kernel (or my hardware?) is broken.
> > >>>> ;-)
> > >>>>
> > >>>> The issue is that when X.org/lightdm starts up, there are "garbled"
> > >>>> section on the screen, for example, rectangular boxes that are just
> > black
> > >>>> or otherwise "distorted", and/or sets of glyphs (corresponding to a
> set
> > >>>> of characters; but not all characters) are displayed as rectangular 
> > >>>> gray
> > >>>> or black boxes, and/or icons in a GNOME session are not displayed
> > >>>> properly, and so on.  (Can take a snapshot if that helps?)  Switching 
> > >>>> to
> > >>>> a Linux console, I can use that one fine.  Switching back to X, in the
> > >>>> majority of all cases, the screen will be completely black, but with 
> > >>>> the
> > >>>> mouse cursor still rendered properly (done in hardware, I assume).
> > >>>>
> > >>>> Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for
> > example on
> > >>>> top of v3.12

RE: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Deucher, Alexander
> -Original Message-
> From: Koenig, Christian
> Sent: Monday, April 28, 2014 3:30 AM
> To: Jerome Glisse; Thomas Schwinge
> Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel Gorman;
> Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
> m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; Deucher,
> Alexander; dri-de...@lists.freedesktop.org
> Subject: Re: radeon: screen garbled after page allocator change, was: Re:
> [patch v2 3/3] mm: page_alloc: fair zone allocator policy
> 
> > +   /* We are living in a monstruous world in which you can have the pci
> > +* root complex behind an hypertransport link which can not address
> > +* anything above 32bit (well hypertransport specification says 40bits
> > +* but hardware such as SIS761 only support 32bits).
> That looks more like a problem with this specific chipset rather than
> something that needs a general solution like this.
> 
> Maybe we should rather add the PCI-ID(s) of the thing to some kind of
> quirks table for now so that the patch isn't so invasive and we can CC
> stable as well?

IIRC, there was someone on IRC with a similar problem with a similar SiS 
chipset a while back.  These SiS chipsets seem to be generally problematic.

Alex

> 
> Just a thought,
> Christian.
> 
> Am 27.04.2014 21:55, schrieb Jerome Glisse:
> > On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
> >> On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> >>> Hi Thomas,
> >>>
> >>> On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> >>>> Hi!
> >>>>
> >>>> On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner
>  wrote:
> >>>>> Each zone that holds userspace pages of one workload must be aged
> at a
> >>>>> speed proportional to the zone size.  [...]
> >>>>> Fix this with a very simple round robin allocator.  [...]
> >>>> This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> >>>> commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> >>>>
> >>>> I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> >>>> and it started to exhibit "strange" issues, which I then bisected to this
> >>>> patch.  I'm not saying that the patch is faulty, as it seems to be
> >>>> working fine for everyone else, so I rather assume that something in a
> >>>> (vastly?) different corner of the kernel (or my hardware?) is broken.
> >>>> ;-)
> >>>>
> >>>> The issue is that when X.org/lightdm starts up, there are "garbled"
> >>>> section on the screen, for example, rectangular boxes that are just
> black
> >>>> or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> >>>> of characters; but not all characters) are displayed as rectangular gray
> >>>> or black boxes, and/or icons in a GNOME session are not displayed
> >>>> properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> >>>> a Linux console, I can use that one fine.  Switching back to X, in the
> >>>> majority of all cases, the screen will be completely black, but with the
> >>>> mouse cursor still rendered properly (done in hardware, I assume).
> >>>>
> >>>> Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for
> example on
> >>>> top of v3.12, and everything is back to normal.  The problem also
> >>>> persists with a v3.14 kernel that I just built.
> >>>>
> >>>> I will try to figure out what's going on, but will gladly take any
> >>>> pointers, or suggestions about how to tackle such a problem.
> >>>>
> >>>> The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1,
> CPU
> >>>> AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> >>>> using that; instead I put in a Sapphire Radeon HD 4350 card.
> >>> I went over this code change repeatedly but I could not see anything
> >>> directly that would explain it.  However, this patch DOES change the
> >>> way allocations are placed (while still respecting zone specifiers
> >>> like __GFP_DMA etc.) and so it's possible that they unearthed a
> >>> corruption, or a wrongly set dma mask in the drivers.
> >>>
> >>> Ccing the radeon driver guys.  Full quote follows.
> >>>
> &g

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Thomas Schwinge
Hi!

On Mon, 28 Apr 2014 10:03:46 +0200, I wrote:
> On Fri, 25 Apr 2014 19:03:22 -0400, Jerome Glisse  wrote:
> > On Fri, Apr 25, 2014 at 05:50:57PM -0400, Jerome Glisse wrote:
> > > On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
> > > > On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> > > > > On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > > > > > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner 
> > > > > >  wrote:
> > > > > > > Each zone that holds userspace pages of one workload must be aged 
> > > > > > > at a
> > > > > > > speed proportional to the zone size.  [...]
> > > > > > 
> > > > > > > Fix this with a very simple round robin allocator.  [...]
> > > > > > 
> > > > > > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > > > > > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > > > > > 
> > > > > > I recently upgraded a Debian testing system from a 3.11 kernel to 
> > > > > > 3.12,
> > > > > > and it started to exhibit "strange" issues, which I then bisected 
> > > > > > to this
> > > > > > patch.  I'm not saying that the patch is faulty, as it seems to be
> > > > > > working fine for everyone else, so I rather assume that something 
> > > > > > in a
> > > > > > (vastly?) different corner of the kernel (or my hardware?) is 
> > > > > > broken.
> > > > > > ;-)
> > > > > > 
> > > > > > The issue is that when X.org/lightdm starts up, there are "garbled"
> > > > > > section on the screen, for example, rectangular boxes that are just 
> > > > > > black
> > > > > > or otherwise "distorted", and/or sets of glyphs (corresponding to a 
> > > > > > set
> > > > > > of characters; but not all characters) are displayed as rectangular 
> > > > > > gray
> > > > > > or black boxes, and/or icons in a GNOME session are not displayed
> > > > > > properly, and so on.  (Can take a snapshot if that helps?)  
> > > > > > Switching to
> > > > > > a Linux console, I can use that one fine.  Switching back to X, in 
> > > > > > the
> > > > > > majority of all cases, the screen will be completely black, but 
> > > > > > with the
> > > > > > mouse cursor still rendered properly (done in hardware, I assume).
> > > > > > 
> > > > > > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for 
> > > > > > example on
> > > > > > top of v3.12, and everything is back to normal.  The problem also
> > > > > > persists with a v3.14 kernel that I just built.

> > > > My guess is that the pcie bridge can only remap dma page with 32bit dma
> > > > mask while the gpu is fine with 40bit dma mask. I always thought that 
> > > > the
> > > > pcie/pci code did take care of such thing for us.
> > > 
> > > Forgot to attach patch to test my theory. Does the attached patch fix
> > > the issue ?
> 
> Unfortunately it does not.  :-/

Ha, the following seems to do it: additionally to dma_bits (your patch),
I'm also overriding need_dma32 for later use in
drivers/gpu/drm/ttm/ttm_bo.c:ttm_bo_add_ttm, I assume.  With that hack
applied, I have now rebooted a v3.14 build a few times, and so far things
"look" fine.

diff --git drivers/gpu/drm/radeon/radeon_device.c 
drivers/gpu/drm/radeon/radeon_device.c
index 044bc98..90baf2f 100644
--- drivers/gpu/drm/radeon/radeon_device.c
+++ drivers/gpu/drm/radeon/radeon_device.c
@@ -1243,6 +1243,8 @@ int radeon_device_init(struct radeon_device *rdev,
rdev->need_dma32 = true;
 
dma_bits = rdev->need_dma32 ? 32 : 40;
+   dma_bits = 32;
+   rdev->need_dma32 = true;
r = pci_set_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
if (r) {
rdev->need_dma32 = true;


Grüße,
 Thomas


pgp0AOEjMaoTo.pgp
Description: PGP signature


Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Thomas Schwinge
Hi!

On Sun, 27 Apr 2014 15:55:29 -0400, Jerome Glisse  wrote:
> If my ugly patch works does this quirk also work ?

Unfortunately they both don't; see my other email,
.


Also, the quirk patch resulted in a NULL pointer dereference in
pci_find_ht_capability+0x4/0x30, which I hacked around as follows:

diff --git drivers/pci/quirks.c drivers/pci/quirks.c
index f025867..33aaad2 100644
--- drivers/pci/quirks.c
+++ drivers/pci/quirks.c
@@ -2452,6 +2452,8 @@ u64 pci_ht_quirk_dma_32bit_only(struct pci_dev *dev, u64 
mask)
struct pci_dev *bridge = bus->self;
int pos;
 
+   if (!bridge)
+   goto skip;
pos = pci_find_ht_capability(bridge, HT_CAPTYPE_SLAVE);
if (pos) {
int ctrl_off;
@@ -2472,6 +2474,7 @@ u64 pci_ht_quirk_dma_32bit_only(struct pci_dev *dev, u64 
mask)
return 0x;
}
}
+   skip:
bus = bus->parent;
} while (bus);
return mask;

If needed, I can try to capture more data, but someone who has knowledge
of PCI bus architecture and Linux kernel code (so, not me), might
probably already see what's wrong.


Grüße,
 Thomas


pgpj_AHefEIag.pgp
Description: PGP signature


Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Christian König

+   /* We are living in a monstruous world in which you can have the pci
+* root complex behind an hypertransport link which can not address
+* anything above 32bit (well hypertransport specification says 40bits
+* but hardware such as SIS761 only support 32bits).
That looks more like a problem with this specific chipset rather than 
something that needs a general solution like this.


Maybe we should rather add the PCI-ID(s) of the thing to some kind of 
quirks table for now so that the patch isn't so invasive and we can CC 
stable as well?


Just a thought,
Christian.

Am 27.04.2014 21:55, schrieb Jerome Glisse:

On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:

On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:

Hi Thomas,

On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:

Hi!

On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner  wrote:

Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  [...]
Fix this with a very simple round robin allocator.  [...]

This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).

I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
and it started to exhibit "strange" issues, which I then bisected to this
patch.  I'm not saying that the patch is faulty, as it seems to be
working fine for everyone else, so I rather assume that something in a
(vastly?) different corner of the kernel (or my hardware?) is broken.
;-)

The issue is that when X.org/lightdm starts up, there are "garbled"
section on the screen, for example, rectangular boxes that are just black
or otherwise "distorted", and/or sets of glyphs (corresponding to a set
of characters; but not all characters) are displayed as rectangular gray
or black boxes, and/or icons in a GNOME session are not displayed
properly, and so on.  (Can take a snapshot if that helps?)  Switching to
a Linux console, I can use that one fine.  Switching back to X, in the
majority of all cases, the screen will be completely black, but with the
mouse cursor still rendered properly (done in hardware, I assume).

Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
top of v3.12, and everything is back to normal.  The problem also
persists with a v3.14 kernel that I just built.

I will try to figure out what's going on, but will gladly take any
pointers, or suggestions about how to tackle such a problem.

The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
using that; instead I put in a Sapphire Radeon HD 4350 card.

I went over this code change repeatedly but I could not see anything
directly that would explain it.  However, this patch DOES change the
way allocations are placed (while still respecting zone specifiers
like __GFP_DMA etc.) and so it's possible that they unearthed a
corruption, or a wrongly set dma mask in the drivers.

Ccing the radeon driver guys.  Full quote follows.


 $ cat < /proc/cpuinfo
 processor   : 0
 vendor_id   : AuthenticAMD
 cpu family  : 15
 model   : 47
 model name  : AMD Sempron(tm) Processor 3000+
 stepping: 2
 cpu MHz : 1000.000
 cache size  : 128 KB
 physical id : 0
 siblings: 1
 core id : 0
 cpu cores   : 1
 apicid  : 0
 initial apicid  : 0
 fpu : yes
 fpu_exception   : yes
 cpuid level : 1
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 
3dnow rep_good nopl pni lahf_lm
 bogomips: 2000.20
 TLB size: 1024 4K pages
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management: ts fid vid ttp tm stc
 $ sudo lspci -nn -k -vv
 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 Host 
[1039:0761] (rev 01)
 Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
[1734:1099]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- 
 Capabilities: [d0] HyperTransport: Slave or Primary Interface
 Command: BaseUnitID=0 UnitCnt=17 MastHost- DefDir- DUL-
 Link Control 0: CFlE- CST- CFE-  
 00:01.0 PCI bridge [0604]: Silicon Integrated Systems [SiS] PCI-to-PCI bridge [1039:0004] (prog-if 00 [Normal decode])

 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Christian König

+   /* We are living in a monstruous world in which you can have the pci
+* root complex behind an hypertransport link which can not address
+* anything above 32bit (well hypertransport specification says 40bits
+* but hardware such as SIS761 only support 32bits).
That looks more like a problem with this specific chipset rather than 
something that needs a general solution like this.


Maybe we should rather add the PCI-ID(s) of the thing to some kind of 
quirks table for now so that the patch isn't so invasive and we can CC 
stable as well?


Just a thought,
Christian.

Am 27.04.2014 21:55, schrieb Jerome Glisse:

On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:

On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:

Hi Thomas,

On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:

Hi!

On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner han...@cmpxchg.org wrote:

Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  [...]
Fix this with a very simple round robin allocator.  [...]

This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).

I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
and it started to exhibit strange issues, which I then bisected to this
patch.  I'm not saying that the patch is faulty, as it seems to be
working fine for everyone else, so I rather assume that something in a
(vastly?) different corner of the kernel (or my hardware?) is broken.
;-)

The issue is that when X.org/lightdm starts up, there are garbled
section on the screen, for example, rectangular boxes that are just black
or otherwise distorted, and/or sets of glyphs (corresponding to a set
of characters; but not all characters) are displayed as rectangular gray
or black boxes, and/or icons in a GNOME session are not displayed
properly, and so on.  (Can take a snapshot if that helps?)  Switching to
a Linux console, I can use that one fine.  Switching back to X, in the
majority of all cases, the screen will be completely black, but with the
mouse cursor still rendered properly (done in hardware, I assume).

Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
top of v3.12, and everything is back to normal.  The problem also
persists with a v3.14 kernel that I just built.

I will try to figure out what's going on, but will gladly take any
pointers, or suggestions about how to tackle such a problem.

The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
using that; instead I put in a Sapphire Radeon HD 4350 card.

I went over this code change repeatedly but I could not see anything
directly that would explain it.  However, this patch DOES change the
way allocations are placed (while still respecting zone specifiers
like __GFP_DMA etc.) and so it's possible that they unearthed a
corruption, or a wrongly set dma mask in the drivers.

Ccing the radeon driver guys.  Full quote follows.


 $ cat  /proc/cpuinfo
 processor   : 0
 vendor_id   : AuthenticAMD
 cpu family  : 15
 model   : 47
 model name  : AMD Sempron(tm) Processor 3000+
 stepping: 2
 cpu MHz : 1000.000
 cache size  : 128 KB
 physical id : 0
 siblings: 1
 core id : 0
 cpu cores   : 1
 apicid  : 0
 initial apicid  : 0
 fpu : yes
 fpu_exception   : yes
 cpuid level : 1
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 
3dnow rep_good nopl pni lahf_lm
 bogomips: 2000.20
 TLB size: 1024 4K pages
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management: ts fid vid ttp tm stc
 $ sudo lspci -nn -k -vv
 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 Host 
[1039:0761] (rev 01)
 Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
[1734:1099]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- 
MAbort+ SERR- PERR- INTx-
 Latency: 64
 Region 0: Memory at f000 (32-bit, non-prefetchable) [size=32M]
 Capabilities: [a0] AGP version 3.0
 Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
HTrans- 64bit- FW- AGP3+ Rate=x4,x8
 Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- FW- 
Rate=none
 Capabilities: [d0] HyperTransport: Slave or Primary Interface
 Command: 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Thomas Schwinge
Hi!

On Sun, 27 Apr 2014 15:55:29 -0400, Jerome Glisse j.gli...@gmail.com wrote:
 If my ugly patch works does this quirk also work ?

Unfortunately they both don't; see my other email,
http://news.gmane.org/find-root.php?message_id=%3C87sioxq3rx.fsf%40schwinge.name%3E.


Also, the quirk patch resulted in a NULL pointer dereference in
pci_find_ht_capability+0x4/0x30, which I hacked around as follows:

diff --git drivers/pci/quirks.c drivers/pci/quirks.c
index f025867..33aaad2 100644
--- drivers/pci/quirks.c
+++ drivers/pci/quirks.c
@@ -2452,6 +2452,8 @@ u64 pci_ht_quirk_dma_32bit_only(struct pci_dev *dev, u64 
mask)
struct pci_dev *bridge = bus-self;
int pos;
 
+   if (!bridge)
+   goto skip;
pos = pci_find_ht_capability(bridge, HT_CAPTYPE_SLAVE);
if (pos) {
int ctrl_off;
@@ -2472,6 +2474,7 @@ u64 pci_ht_quirk_dma_32bit_only(struct pci_dev *dev, u64 
mask)
return 0x;
}
}
+   skip:
bus = bus-parent;
} while (bus);
return mask;

If needed, I can try to capture more data, but someone who has knowledge
of PCI bus architecture and Linux kernel code (so, not me), might
probably already see what's wrong.


Grüße,
 Thomas


pgpj_AHefEIag.pgp
Description: PGP signature


Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Thomas Schwinge
Hi!

On Mon, 28 Apr 2014 10:03:46 +0200, I wrote:
 On Fri, 25 Apr 2014 19:03:22 -0400, Jerome Glisse j.gli...@gmail.com wrote:
  On Fri, Apr 25, 2014 at 05:50:57PM -0400, Jerome Glisse wrote:
   On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
 On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
  On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner 
  han...@cmpxchg.org wrote:
   Each zone that holds userspace pages of one workload must be aged 
   at a
   speed proportional to the zone size.  [...]
  
   Fix this with a very simple round robin allocator.  [...]
  
  This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
  commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
  
  I recently upgraded a Debian testing system from a 3.11 kernel to 
  3.12,
  and it started to exhibit strange issues, which I then bisected 
  to this
  patch.  I'm not saying that the patch is faulty, as it seems to be
  working fine for everyone else, so I rather assume that something 
  in a
  (vastly?) different corner of the kernel (or my hardware?) is 
  broken.
  ;-)
  
  The issue is that when X.org/lightdm starts up, there are garbled
  section on the screen, for example, rectangular boxes that are just 
  black
  or otherwise distorted, and/or sets of glyphs (corresponding to a 
  set
  of characters; but not all characters) are displayed as rectangular 
  gray
  or black boxes, and/or icons in a GNOME session are not displayed
  properly, and so on.  (Can take a snapshot if that helps?)  
  Switching to
  a Linux console, I can use that one fine.  Switching back to X, in 
  the
  majority of all cases, the screen will be completely black, but 
  with the
  mouse cursor still rendered properly (done in hardware, I assume).
  
  Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for 
  example on
  top of v3.12, and everything is back to normal.  The problem also
  persists with a v3.14 kernel that I just built.

My guess is that the pcie bridge can only remap dma page with 32bit dma
mask while the gpu is fine with 40bit dma mask. I always thought that 
the
pcie/pci code did take care of such thing for us.
   
   Forgot to attach patch to test my theory. Does the attached patch fix
   the issue ?
 
 Unfortunately it does not.  :-/

Ha, the following seems to do it: additionally to dma_bits (your patch),
I'm also overriding need_dma32 for later use in
drivers/gpu/drm/ttm/ttm_bo.c:ttm_bo_add_ttm, I assume.  With that hack
applied, I have now rebooted a v3.14 build a few times, and so far things
look fine.

diff --git drivers/gpu/drm/radeon/radeon_device.c 
drivers/gpu/drm/radeon/radeon_device.c
index 044bc98..90baf2f 100644
--- drivers/gpu/drm/radeon/radeon_device.c
+++ drivers/gpu/drm/radeon/radeon_device.c
@@ -1243,6 +1243,8 @@ int radeon_device_init(struct radeon_device *rdev,
rdev-need_dma32 = true;
 
dma_bits = rdev-need_dma32 ? 32 : 40;
+   dma_bits = 32;
+   rdev-need_dma32 = true;
r = pci_set_dma_mask(rdev-pdev, DMA_BIT_MASK(dma_bits));
if (r) {
rdev-need_dma32 = true;


Grüße,
 Thomas


pgp0AOEjMaoTo.pgp
Description: PGP signature


RE: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Deucher, Alexander
 -Original Message-
 From: Koenig, Christian
 Sent: Monday, April 28, 2014 3:30 AM
 To: Jerome Glisse; Thomas Schwinge
 Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel Gorman;
 Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
 m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; Deucher,
 Alexander; dri-de...@lists.freedesktop.org
 Subject: Re: radeon: screen garbled after page allocator change, was: Re:
 [patch v2 3/3] mm: page_alloc: fair zone allocator policy
 
  +   /* We are living in a monstruous world in which you can have the pci
  +* root complex behind an hypertransport link which can not address
  +* anything above 32bit (well hypertransport specification says 40bits
  +* but hardware such as SIS761 only support 32bits).
 That looks more like a problem with this specific chipset rather than
 something that needs a general solution like this.
 
 Maybe we should rather add the PCI-ID(s) of the thing to some kind of
 quirks table for now so that the patch isn't so invasive and we can CC
 stable as well?

IIRC, there was someone on IRC with a similar problem with a similar SiS 
chipset a while back.  These SiS chipsets seem to be generally problematic.

Alex

 
 Just a thought,
 Christian.
 
 Am 27.04.2014 21:55, schrieb Jerome Glisse:
  On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
  On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
  Hi Thomas,
 
  On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
  Hi!
 
  On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner
 han...@cmpxchg.org wrote:
  Each zone that holds userspace pages of one workload must be aged
 at a
  speed proportional to the zone size.  [...]
  Fix this with a very simple round robin allocator.  [...]
  This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
  commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
 
  I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
  and it started to exhibit strange issues, which I then bisected to this
  patch.  I'm not saying that the patch is faulty, as it seems to be
  working fine for everyone else, so I rather assume that something in a
  (vastly?) different corner of the kernel (or my hardware?) is broken.
  ;-)
 
  The issue is that when X.org/lightdm starts up, there are garbled
  section on the screen, for example, rectangular boxes that are just
 black
  or otherwise distorted, and/or sets of glyphs (corresponding to a set
  of characters; but not all characters) are displayed as rectangular gray
  or black boxes, and/or icons in a GNOME session are not displayed
  properly, and so on.  (Can take a snapshot if that helps?)  Switching to
  a Linux console, I can use that one fine.  Switching back to X, in the
  majority of all cases, the screen will be completely black, but with the
  mouse cursor still rendered properly (done in hardware, I assume).
 
  Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for
 example on
  top of v3.12, and everything is back to normal.  The problem also
  persists with a v3.14 kernel that I just built.
 
  I will try to figure out what's going on, but will gladly take any
  pointers, or suggestions about how to tackle such a problem.
 
  The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1,
 CPU
  AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
  using that; instead I put in a Sapphire Radeon HD 4350 card.
  I went over this code change repeatedly but I could not see anything
  directly that would explain it.  However, this patch DOES change the
  way allocations are placed (while still respecting zone specifiers
  like __GFP_DMA etc.) and so it's possible that they unearthed a
  corruption, or a wrongly set dma mask in the drivers.
 
  Ccing the radeon driver guys.  Full quote follows.
 
   $ cat  /proc/cpuinfo
   processor   : 0
   vendor_id   : AuthenticAMD
   cpu family  : 15
   model   : 47
   model name  : AMD Sempron(tm) Processor 3000+
   stepping: 2
   cpu MHz : 1000.000
   cache size  : 128 KB
   physical id : 0
   siblings: 1
   core id : 0
   cpu cores   : 1
   apicid  : 0
   initial apicid  : 0
   fpu : yes
   fpu_exception   : yes
   cpuid level : 1
   wp  : yes
   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
  pge
 mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
 3dnowext 3dnow rep_good nopl pni lahf_lm
   bogomips: 2000.20
   TLB size: 1024 4K pages
   clflush size: 64
   cache_alignment : 64
   address sizes   : 40 bits physical, 48 bits virtual
   power management: ts fid vid ttp tm stc
   $ sudo lspci -nn -k -vv
   00:00.0 Host bridge [0600]: Silicon Integrated

RE: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-28 Thread Deucher, Alexander
 -Original Message-
 From: Deucher, Alexander
 Sent: Monday, April 28, 2014 8:50 AM
 To: Koenig, Christian; Jerome Glisse; Thomas Schwinge
 Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel Gorman;
 Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
 m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; dri-
 de...@lists.freedesktop.org
 Subject: RE: radeon: screen garbled after page allocator change, was: Re:
 [patch v2 3/3] mm: page_alloc: fair zone allocator policy
 
  -Original Message-
  From: Koenig, Christian
  Sent: Monday, April 28, 2014 3:30 AM
  To: Jerome Glisse; Thomas Schwinge
  Cc: Bjorn Helgaas; linux-...@vger.kernel.org; Johannes Weiner; Mel
 Gorman;
  Rik van Riel; Andrea Arcangeli; Zlatko Calusic; Minchan Kim; linux-
  m...@kvack.org; linux-kernel@vger.kernel.org; Andrew Morton; Deucher,
  Alexander; dri-de...@lists.freedesktop.org
  Subject: Re: radeon: screen garbled after page allocator change, was: Re:
  [patch v2 3/3] mm: page_alloc: fair zone allocator policy
 
   + /* We are living in a monstruous world in which you can have the pci
   +  * root complex behind an hypertransport link which can not address
   +  * anything above 32bit (well hypertransport specification says 40bits
   +  * but hardware such as SIS761 only support 32bits).
  That looks more like a problem with this specific chipset rather than
  something that needs a general solution like this.
 
  Maybe we should rather add the PCI-ID(s) of the thing to some kind of
  quirks table for now so that the patch isn't so invasive and we can CC
  stable as well?
 
 IIRC, there was someone on IRC with a similar problem with a similar SiS
 chipset a while back.  These SiS chipsets seem to be generally problematic.

IIRC, in the IRC case, the fix was to limit the about of physical memory in the 
system.

Alex

 
 Alex
 
 
  Just a thought,
  Christian.
 
  Am 27.04.2014 21:55, schrieb Jerome Glisse:
   On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
   On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
   Hi Thomas,
  
   On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
   Hi!
  
   On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner
  han...@cmpxchg.org wrote:
   Each zone that holds userspace pages of one workload must be
 aged
  at a
   speed proportional to the zone size.  [...]
   Fix this with a very simple round robin allocator.  [...]
   This patch, adding NR_ALLOC_BATCH, eventually landed in mainline
 as
   commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
  
   I recently upgraded a Debian testing system from a 3.11 kernel to
 3.12,
   and it started to exhibit strange issues, which I then bisected to 
   this
   patch.  I'm not saying that the patch is faulty, as it seems to be
   working fine for everyone else, so I rather assume that something in
 a
   (vastly?) different corner of the kernel (or my hardware?) is broken.
   ;-)
  
   The issue is that when X.org/lightdm starts up, there are garbled
   section on the screen, for example, rectangular boxes that are just
  black
   or otherwise distorted, and/or sets of glyphs (corresponding to a
 set
   of characters; but not all characters) are displayed as rectangular 
   gray
   or black boxes, and/or icons in a GNOME session are not displayed
   properly, and so on.  (Can take a snapshot if that helps?)  Switching 
   to
   a Linux console, I can use that one fine.  Switching back to X, in the
   majority of all cases, the screen will be completely black, but with 
   the
   mouse cursor still rendered properly (done in hardware, I assume).
  
   Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for
  example on
   top of v3.12, and everything is back to normal.  The problem also
   persists with a v3.14 kernel that I just built.
  
   I will try to figure out what's going on, but will gladly take any
   pointers, or suggestions about how to tackle such a problem.
  
   The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-
 A1,
  CPU
   AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm
 not
   using that; instead I put in a Sapphire Radeon HD 4350 card.
   I went over this code change repeatedly but I could not see anything
   directly that would explain it.  However, this patch DOES change the
   way allocations are placed (while still respecting zone specifiers
   like __GFP_DMA etc.) and so it's possible that they unearthed a
   corruption, or a wrongly set dma mask in the drivers.
  
   Ccing the radeon driver guys.  Full quote follows.
  
$ cat  /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 47
model name  : AMD Sempron(tm) Processor 3000+
stepping: 2
cpu MHz : 1000.000
cache size  : 128 KB
physical id : 0
siblings: 1
core

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-27 Thread Jerome Glisse
On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
> On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> > Hi Thomas,
> > 
> > On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > > Hi!
> > > 
> > > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner  
> > > wrote:
> > > > Each zone that holds userspace pages of one workload must be aged at a
> > > > speed proportional to the zone size.  [...]
> > > 
> > > > Fix this with a very simple round robin allocator.  [...]
> > > 
> > > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > > 
> > > I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> > > and it started to exhibit "strange" issues, which I then bisected to this
> > > patch.  I'm not saying that the patch is faulty, as it seems to be
> > > working fine for everyone else, so I rather assume that something in a
> > > (vastly?) different corner of the kernel (or my hardware?) is broken.
> > > ;-)
> > > 
> > > The issue is that when X.org/lightdm starts up, there are "garbled"
> > > section on the screen, for example, rectangular boxes that are just black
> > > or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> > > of characters; but not all characters) are displayed as rectangular gray
> > > or black boxes, and/or icons in a GNOME session are not displayed
> > > properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> > > a Linux console, I can use that one fine.  Switching back to X, in the
> > > majority of all cases, the screen will be completely black, but with the
> > > mouse cursor still rendered properly (done in hardware, I assume).
> > > 
> > > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
> > > top of v3.12, and everything is back to normal.  The problem also
> > > persists with a v3.14 kernel that I just built.
> > > 
> > > I will try to figure out what's going on, but will gladly take any
> > > pointers, or suggestions about how to tackle such a problem.
> > > 
> > > The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
> > > AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> > > using that; instead I put in a Sapphire Radeon HD 4350 card.
> > 
> > I went over this code change repeatedly but I could not see anything
> > directly that would explain it.  However, this patch DOES change the
> > way allocations are placed (while still respecting zone specifiers
> > like __GFP_DMA etc.) and so it's possible that they unearthed a
> > corruption, or a wrongly set dma mask in the drivers.
> > 
> > Ccing the radeon driver guys.  Full quote follows.
> > 
> > > $ cat < /proc/cpuinfo
> > > processor   : 0
> > > vendor_id   : AuthenticAMD
> > > cpu family  : 15
> > > model   : 47
> > > model name  : AMD Sempron(tm) Processor 3000+
> > > stepping: 2
> > > cpu MHz : 1000.000
> > > cache size  : 128 KB
> > > physical id : 0
> > > siblings: 1
> > > core id : 0
> > > cpu cores   : 1
> > > apicid  : 0
> > > initial apicid  : 0
> > > fpu : yes
> > > fpu_exception   : yes
> > > cpuid level : 1
> > > wp  : yes
> > > flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
> > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
> > > fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm
> > > bogomips: 2000.20
> > > TLB size: 1024 4K pages
> > > clflush size: 64
> > > cache_alignment : 64
> > > address sizes   : 40 bits physical, 48 bits virtual
> > > power management: ts fid vid ttp tm stc
> > > $ sudo lspci -nn -k -vv
> > > 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
> > > Host [1039:0761] (rev 01)
> > > Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
> > > [1734:1099]
> > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > > ParErr- Stepping- SERR+ FastB2B- DisINTx-
> > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
> > > >TAbort- SERR-  > > Latency: 64
> > > Region 0: Memory at f000 (32-bit, non-prefetchable) 
> > > [size=32M]
> > > Capabilities: [a0] AGP version 3.0
> > > Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
> > > HTrans- 64bit- FW- AGP3+ Rate=x4,x8
> > > Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- 
> > > FW- Rate=
> > > Capabilities: [d0] HyperTransport: Slave or Primary Interface
> > > Command: BaseUnitID=0 UnitCnt=17 MastHost- DefDir- 
> > > DUL-
> > > Link Control 0: CFlE- CST- CFE-  > > TXO-  > > Link Config 0: 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-27 Thread Jerome Glisse
On Sat, Apr 26, 2014 at 11:31:11PM -0400, Jerome Glisse wrote:
 On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
  Hi Thomas,
  
  On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
   Hi!
   
   On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner han...@cmpxchg.org 
   wrote:
Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  [...]
   
Fix this with a very simple round robin allocator.  [...]
   
   This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
   commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
   
   I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
   and it started to exhibit strange issues, which I then bisected to this
   patch.  I'm not saying that the patch is faulty, as it seems to be
   working fine for everyone else, so I rather assume that something in a
   (vastly?) different corner of the kernel (or my hardware?) is broken.
   ;-)
   
   The issue is that when X.org/lightdm starts up, there are garbled
   section on the screen, for example, rectangular boxes that are just black
   or otherwise distorted, and/or sets of glyphs (corresponding to a set
   of characters; but not all characters) are displayed as rectangular gray
   or black boxes, and/or icons in a GNOME session are not displayed
   properly, and so on.  (Can take a snapshot if that helps?)  Switching to
   a Linux console, I can use that one fine.  Switching back to X, in the
   majority of all cases, the screen will be completely black, but with the
   mouse cursor still rendered properly (done in hardware, I assume).
   
   Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
   top of v3.12, and everything is back to normal.  The problem also
   persists with a v3.14 kernel that I just built.
   
   I will try to figure out what's going on, but will gladly take any
   pointers, or suggestions about how to tackle such a problem.
   
   The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
   AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
   using that; instead I put in a Sapphire Radeon HD 4350 card.
  
  I went over this code change repeatedly but I could not see anything
  directly that would explain it.  However, this patch DOES change the
  way allocations are placed (while still respecting zone specifiers
  like __GFP_DMA etc.) and so it's possible that they unearthed a
  corruption, or a wrongly set dma mask in the drivers.
  
  Ccing the radeon driver guys.  Full quote follows.
  
   $ cat  /proc/cpuinfo
   processor   : 0
   vendor_id   : AuthenticAMD
   cpu family  : 15
   model   : 47
   model name  : AMD Sempron(tm) Processor 3000+
   stepping: 2
   cpu MHz : 1000.000
   cache size  : 128 KB
   physical id : 0
   siblings: 1
   core id : 0
   cpu cores   : 1
   apicid  : 0
   initial apicid  : 0
   fpu : yes
   fpu_exception   : yes
   cpuid level : 1
   wp  : yes
   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
   pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
   fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm
   bogomips: 2000.20
   TLB size: 1024 4K pages
   clflush size: 64
   cache_alignment : 64
   address sizes   : 40 bits physical, 48 bits virtual
   power management: ts fid vid ttp tm stc
   $ sudo lspci -nn -k -vv
   00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
   Host [1039:0761] (rev 01)
   Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
   [1734:1099]
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
   ParErr- Stepping- SERR+ FastB2B- DisINTx-
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
   TAbort- TAbort- MAbort+ SERR- PERR- INTx-
   Latency: 64
   Region 0: Memory at f000 (32-bit, non-prefetchable) 
   [size=32M]
   Capabilities: [a0] AGP version 3.0
   Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
   HTrans- 64bit- FW- AGP3+ Rate=x4,x8
   Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- 
   FW- Rate=none
   Capabilities: [d0] HyperTransport: Slave or Primary Interface
   Command: BaseUnitID=0 UnitCnt=17 MastHost- DefDir- 
   DUL-
   Link Control 0: CFlE- CST- CFE- LkFail- Init+ EOC- 
   TXO- CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
   Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- 
   LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn-
   Link Control 1: CFlE- CST- CFE- LkFail+ Init- EOC+ 
   TXO+ CRCErr=0 IsocEn- LSEn- ExtCTL- 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-26 Thread Jerome Glisse
On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> Hi Thomas,
> 
> On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > Hi!
> > 
> > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner  
> > wrote:
> > > Each zone that holds userspace pages of one workload must be aged at a
> > > speed proportional to the zone size.  [...]
> > 
> > > Fix this with a very simple round robin allocator.  [...]
> > 
> > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > 
> > I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> > and it started to exhibit "strange" issues, which I then bisected to this
> > patch.  I'm not saying that the patch is faulty, as it seems to be
> > working fine for everyone else, so I rather assume that something in a
> > (vastly?) different corner of the kernel (or my hardware?) is broken.
> > ;-)
> > 
> > The issue is that when X.org/lightdm starts up, there are "garbled"
> > section on the screen, for example, rectangular boxes that are just black
> > or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> > of characters; but not all characters) are displayed as rectangular gray
> > or black boxes, and/or icons in a GNOME session are not displayed
> > properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> > a Linux console, I can use that one fine.  Switching back to X, in the
> > majority of all cases, the screen will be completely black, but with the
> > mouse cursor still rendered properly (done in hardware, I assume).
> > 
> > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
> > top of v3.12, and everything is back to normal.  The problem also
> > persists with a v3.14 kernel that I just built.
> > 
> > I will try to figure out what's going on, but will gladly take any
> > pointers, or suggestions about how to tackle such a problem.
> > 
> > The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
> > AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> > using that; instead I put in a Sapphire Radeon HD 4350 card.
> 
> I went over this code change repeatedly but I could not see anything
> directly that would explain it.  However, this patch DOES change the
> way allocations are placed (while still respecting zone specifiers
> like __GFP_DMA etc.) and so it's possible that they unearthed a
> corruption, or a wrongly set dma mask in the drivers.
> 
> Ccing the radeon driver guys.  Full quote follows.
> 
> > $ cat < /proc/cpuinfo
> > processor   : 0
> > vendor_id   : AuthenticAMD
> > cpu family  : 15
> > model   : 47
> > model name  : AMD Sempron(tm) Processor 3000+
> > stepping: 2
> > cpu MHz : 1000.000
> > cache size  : 128 KB
> > physical id : 0
> > siblings: 1
> > core id : 0
> > cpu cores   : 1
> > apicid  : 0
> > initial apicid  : 0
> > fpu : yes
> > fpu_exception   : yes
> > cpuid level : 1
> > wp  : yes
> > flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> > mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 
> > 3dnowext 3dnow rep_good nopl pni lahf_lm
> > bogomips: 2000.20
> > TLB size: 1024 4K pages
> > clflush size: 64
> > cache_alignment : 64
> > address sizes   : 40 bits physical, 48 bits virtual
> > power management: ts fid vid ttp tm stc
> > $ sudo lspci -nn -k -vv
> > 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
> > Host [1039:0761] (rev 01)
> > Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
> > [1734:1099]
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > ParErr- Stepping- SERR+ FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
> > >TAbort- SERR-  > Latency: 64
> > Region 0: Memory at f000 (32-bit, non-prefetchable) 
> > [size=32M]
> > Capabilities: [a0] AGP version 3.0
> > Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
> > HTrans- 64bit- FW- AGP3+ Rate=x4,x8
> > Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- 
> > FW- Rate=
> > Capabilities: [d0] HyperTransport: Slave or Primary Interface
> > Command: BaseUnitID=0 UnitCnt=17 MastHost- DefDir- DUL-
> > Link Control 0: CFlE- CST- CFE-  > TXO-  > Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- 
> > LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn-
> > Link Control 1: CFlE- CST- CFE-  > TXO+  > Link Config 1: MLWI=N/C DwFcIn- MLWO=N/C DwFcOut- 
> > LWI=N/C DwFcInEn- LWO=N/C DwFcOutEn-
> >  

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-26 Thread Jerome Glisse
On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
 Hi Thomas,
 
 On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
  Hi!
  
  On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner han...@cmpxchg.org 
  wrote:
   Each zone that holds userspace pages of one workload must be aged at a
   speed proportional to the zone size.  [...]
  
   Fix this with a very simple round robin allocator.  [...]
  
  This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
  commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
  
  I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
  and it started to exhibit strange issues, which I then bisected to this
  patch.  I'm not saying that the patch is faulty, as it seems to be
  working fine for everyone else, so I rather assume that something in a
  (vastly?) different corner of the kernel (or my hardware?) is broken.
  ;-)
  
  The issue is that when X.org/lightdm starts up, there are garbled
  section on the screen, for example, rectangular boxes that are just black
  or otherwise distorted, and/or sets of glyphs (corresponding to a set
  of characters; but not all characters) are displayed as rectangular gray
  or black boxes, and/or icons in a GNOME session are not displayed
  properly, and so on.  (Can take a snapshot if that helps?)  Switching to
  a Linux console, I can use that one fine.  Switching back to X, in the
  majority of all cases, the screen will be completely black, but with the
  mouse cursor still rendered properly (done in hardware, I assume).
  
  Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
  top of v3.12, and everything is back to normal.  The problem also
  persists with a v3.14 kernel that I just built.
  
  I will try to figure out what's going on, but will gladly take any
  pointers, or suggestions about how to tackle such a problem.
  
  The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
  AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
  using that; instead I put in a Sapphire Radeon HD 4350 card.
 
 I went over this code change repeatedly but I could not see anything
 directly that would explain it.  However, this patch DOES change the
 way allocations are placed (while still respecting zone specifiers
 like __GFP_DMA etc.) and so it's possible that they unearthed a
 corruption, or a wrongly set dma mask in the drivers.
 
 Ccing the radeon driver guys.  Full quote follows.
 
  $ cat  /proc/cpuinfo
  processor   : 0
  vendor_id   : AuthenticAMD
  cpu family  : 15
  model   : 47
  model name  : AMD Sempron(tm) Processor 3000+
  stepping: 2
  cpu MHz : 1000.000
  cache size  : 128 KB
  physical id : 0
  siblings: 1
  core id : 0
  cpu cores   : 1
  apicid  : 0
  initial apicid  : 0
  fpu : yes
  fpu_exception   : yes
  cpuid level : 1
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
  mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 
  3dnowext 3dnow rep_good nopl pni lahf_lm
  bogomips: 2000.20
  TLB size: 1024 4K pages
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management: ts fid vid ttp tm stc
  $ sudo lspci -nn -k -vv
  00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
  Host [1039:0761] (rev 01)
  Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
  [1734:1099]
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
  ParErr- Stepping- SERR+ FastB2B- DisINTx-
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
  TAbort- TAbort- MAbort+ SERR- PERR- INTx-
  Latency: 64
  Region 0: Memory at f000 (32-bit, non-prefetchable) 
  [size=32M]
  Capabilities: [a0] AGP version 3.0
  Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
  HTrans- 64bit- FW- AGP3+ Rate=x4,x8
  Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- 
  FW- Rate=none
  Capabilities: [d0] HyperTransport: Slave or Primary Interface
  Command: BaseUnitID=0 UnitCnt=17 MastHost- DefDir- DUL-
  Link Control 0: CFlE- CST- CFE- LkFail- Init+ EOC- 
  TXO- CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
  Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- 
  LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn-
  Link Control 1: CFlE- CST- CFE- LkFail+ Init- EOC+ 
  TXO+ CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
  Link Config 1: MLWI=N/C DwFcIn- MLWO=N/C DwFcOut- 
  LWI=N/C DwFcInEn- LWO=N/C DwFcOutEn-
  Revision ID: 1.05
 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Fri, Apr 25, 2014 at 05:50:57PM -0400, Jerome Glisse wrote:
> On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
> > On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> > > Hi Thomas,
> > > 
> > > On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > > > Hi!
> > > > 
> > > > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner 
> > > >  wrote:
> > > > > Each zone that holds userspace pages of one workload must be aged at a
> > > > > speed proportional to the zone size.  [...]
> > > > 
> > > > > Fix this with a very simple round robin allocator.  [...]
> > > > 
> > > > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > > > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > > > 
> > > > I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> > > > and it started to exhibit "strange" issues, which I then bisected to 
> > > > this
> > > > patch.  I'm not saying that the patch is faulty, as it seems to be
> > > > working fine for everyone else, so I rather assume that something in a
> > > > (vastly?) different corner of the kernel (or my hardware?) is broken.
> > > > ;-)
> > > > 
> > > > The issue is that when X.org/lightdm starts up, there are "garbled"
> > > > section on the screen, for example, rectangular boxes that are just 
> > > > black
> > > > or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> > > > of characters; but not all characters) are displayed as rectangular gray
> > > > or black boxes, and/or icons in a GNOME session are not displayed
> > > > properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> > > > a Linux console, I can use that one fine.  Switching back to X, in the
> > > > majority of all cases, the screen will be completely black, but with the
> > > > mouse cursor still rendered properly (done in hardware, I assume).
> > > > 
> > > > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example 
> > > > on
> > > > top of v3.12, and everything is back to normal.  The problem also
> > > > persists with a v3.14 kernel that I just built.
> > > > 
> > > > I will try to figure out what's going on, but will gladly take any
> > > > pointers, or suggestions about how to tackle such a problem.
> > > > 
> > > > The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
> > > > AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> > > > using that; instead I put in a Sapphire Radeon HD 4350 card.
> > > 
> > > I went over this code change repeatedly but I could not see anything
> > > directly that would explain it.  However, this patch DOES change the
> > > way allocations are placed (while still respecting zone specifiers
> > > like __GFP_DMA etc.) and so it's possible that they unearthed a
> > > corruption, or a wrongly set dma mask in the drivers.
> > > 
> > > Ccing the radeon driver guys.  Full quote follows.
> > 
> > Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
> > This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
> > should be fine even without IOMMU. But given the patch you point to, it
> > really can only be something that allocate page in place the GPU fails
> > to access.
> > 
> > Thomas how much memory do you have (again dmes will also provide mapping
> > informations) ?
> > 
> > My guess is that the pcie bridge can only remap dma page with 32bit dma
> > mask while the gpu is fine with 40bit dma mask. I always thought that the
> > pcie/pci code did take care of such thing for us.
> > 
> > Cheers,
> > Jérôme Glisse
> 
> Forgot to attach patch to test my theory. Does the attached patch fix
> the issue ?

So this is likely it, the SIS chipset of this motherboard is a freak show.
It support both PCIE and AGP at same time

http://www.newegg.com/Product/Product.aspx?Item=N82E16813185068

Why in hell ?

So my guess is that the root pcie bridge is behind the AGP bridge which
swallow any address > 32bit and thus the dma mask of the pcie radeon
card is just believing that we are living in a sane world.

Cheers,
Jérôme Glisse

> 
> > 
> > > 
> > > > $ cat < /proc/cpuinfo
> > > > processor   : 0
> > > > vendor_id   : AuthenticAMD
> > > > cpu family  : 15
> > > > model   : 47
> > > > model name  : AMD Sempron(tm) Processor 3000+
> > > > stepping: 2
> > > > cpu MHz : 1000.000
> > > > cache size  : 128 KB
> > > > physical id : 0
> > > > siblings: 1
> > > > core id : 0
> > > > cpu cores   : 1
> > > > apicid  : 0
> > > > initial apicid  : 0
> > > > fpu : yes
> > > > fpu_exception   : yes
> > > > cpuid level : 1
> > > > wp  : yes
> > > > flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
> > > > fxsr_opt 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
> On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> > Hi Thomas,
> > 
> > On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > > Hi!
> > > 
> > > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner  
> > > wrote:
> > > > Each zone that holds userspace pages of one workload must be aged at a
> > > > speed proportional to the zone size.  [...]
> > > 
> > > > Fix this with a very simple round robin allocator.  [...]
> > > 
> > > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > > 
> > > I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> > > and it started to exhibit "strange" issues, which I then bisected to this
> > > patch.  I'm not saying that the patch is faulty, as it seems to be
> > > working fine for everyone else, so I rather assume that something in a
> > > (vastly?) different corner of the kernel (or my hardware?) is broken.
> > > ;-)
> > > 
> > > The issue is that when X.org/lightdm starts up, there are "garbled"
> > > section on the screen, for example, rectangular boxes that are just black
> > > or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> > > of characters; but not all characters) are displayed as rectangular gray
> > > or black boxes, and/or icons in a GNOME session are not displayed
> > > properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> > > a Linux console, I can use that one fine.  Switching back to X, in the
> > > majority of all cases, the screen will be completely black, but with the
> > > mouse cursor still rendered properly (done in hardware, I assume).
> > > 
> > > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
> > > top of v3.12, and everything is back to normal.  The problem also
> > > persists with a v3.14 kernel that I just built.
> > > 
> > > I will try to figure out what's going on, but will gladly take any
> > > pointers, or suggestions about how to tackle such a problem.
> > > 
> > > The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
> > > AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> > > using that; instead I put in a Sapphire Radeon HD 4350 card.
> > 
> > I went over this code change repeatedly but I could not see anything
> > directly that would explain it.  However, this patch DOES change the
> > way allocations are placed (while still respecting zone specifiers
> > like __GFP_DMA etc.) and so it's possible that they unearthed a
> > corruption, or a wrongly set dma mask in the drivers.
> > 
> > Ccing the radeon driver guys.  Full quote follows.
> 
> Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
> This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
> should be fine even without IOMMU. But given the patch you point to, it
> really can only be something that allocate page in place the GPU fails
> to access.
> 
> Thomas how much memory do you have (again dmes will also provide mapping
> informations) ?
> 
> My guess is that the pcie bridge can only remap dma page with 32bit dma
> mask while the gpu is fine with 40bit dma mask. I always thought that the
> pcie/pci code did take care of such thing for us.
> 
> Cheers,
> Jérôme Glisse

Forgot to attach patch to test my theory. Does the attached patch fix
the issue ?

> 
> > 
> > > $ cat < /proc/cpuinfo
> > > processor   : 0
> > > vendor_id   : AuthenticAMD
> > > cpu family  : 15
> > > model   : 47
> > > model name  : AMD Sempron(tm) Processor 3000+
> > > stepping: 2
> > > cpu MHz : 1000.000
> > > cache size  : 128 KB
> > > physical id : 0
> > > siblings: 1
> > > core id : 0
> > > cpu cores   : 1
> > > apicid  : 0
> > > initial apicid  : 0
> > > fpu : yes
> > > fpu_exception   : yes
> > > cpuid level : 1
> > > wp  : yes
> > > flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
> > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
> > > fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm
> > > bogomips: 2000.20
> > > TLB size: 1024 4K pages
> > > clflush size: 64
> > > cache_alignment : 64
> > > address sizes   : 40 bits physical, 48 bits virtual
> > > power management: ts fid vid ttp tm stc
> > > $ sudo lspci -nn -k -vv
> > > 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
> > > Host [1039:0761] (rev 01)
> > > Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
> > > [1734:1099]
> > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > > ParErr- Stepping- SERR+ FastB2B- DisINTx-
> > > Status: Cap+ 66MHz- 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
> Hi Thomas,
> 
> On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
> > Hi!
> > 
> > On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner  
> > wrote:
> > > Each zone that holds userspace pages of one workload must be aged at a
> > > speed proportional to the zone size.  [...]
> > 
> > > Fix this with a very simple round robin allocator.  [...]
> > 
> > This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
> > commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
> > 
> > I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
> > and it started to exhibit "strange" issues, which I then bisected to this
> > patch.  I'm not saying that the patch is faulty, as it seems to be
> > working fine for everyone else, so I rather assume that something in a
> > (vastly?) different corner of the kernel (or my hardware?) is broken.
> > ;-)
> > 
> > The issue is that when X.org/lightdm starts up, there are "garbled"
> > section on the screen, for example, rectangular boxes that are just black
> > or otherwise "distorted", and/or sets of glyphs (corresponding to a set
> > of characters; but not all characters) are displayed as rectangular gray
> > or black boxes, and/or icons in a GNOME session are not displayed
> > properly, and so on.  (Can take a snapshot if that helps?)  Switching to
> > a Linux console, I can use that one fine.  Switching back to X, in the
> > majority of all cases, the screen will be completely black, but with the
> > mouse cursor still rendered properly (done in hardware, I assume).
> > 
> > Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
> > top of v3.12, and everything is back to normal.  The problem also
> > persists with a v3.14 kernel that I just built.
> > 
> > I will try to figure out what's going on, but will gladly take any
> > pointers, or suggestions about how to tackle such a problem.
> > 
> > The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
> > AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
> > using that; instead I put in a Sapphire Radeon HD 4350 card.
> 
> I went over this code change repeatedly but I could not see anything
> directly that would explain it.  However, this patch DOES change the
> way allocations are placed (while still respecting zone specifiers
> like __GFP_DMA etc.) and so it's possible that they unearthed a
> corruption, or a wrongly set dma mask in the drivers.
> 
> Ccing the radeon driver guys.  Full quote follows.

Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
should be fine even without IOMMU. But given the patch you point to, it
really can only be something that allocate page in place the GPU fails
to access.

Thomas how much memory do you have (again dmes will also provide mapping
informations) ?

My guess is that the pcie bridge can only remap dma page with 32bit dma
mask while the gpu is fine with 40bit dma mask. I always thought that the
pcie/pci code did take care of such thing for us.

Cheers,
Jérôme Glisse

> 
> > $ cat < /proc/cpuinfo
> > processor   : 0
> > vendor_id   : AuthenticAMD
> > cpu family  : 15
> > model   : 47
> > model name  : AMD Sempron(tm) Processor 3000+
> > stepping: 2
> > cpu MHz : 1000.000
> > cache size  : 128 KB
> > physical id : 0
> > siblings: 1
> > core id : 0
> > cpu cores   : 1
> > apicid  : 0
> > initial apicid  : 0
> > fpu : yes
> > fpu_exception   : yes
> > cpuid level : 1
> > wp  : yes
> > flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> > mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 
> > 3dnowext 3dnow rep_good nopl pni lahf_lm
> > bogomips: 2000.20
> > TLB size: 1024 4K pages
> > clflush size: 64
> > cache_alignment : 64
> > address sizes   : 40 bits physical, 48 bits virtual
> > power management: ts fid vid ttp tm stc
> > $ sudo lspci -nn -k -vv
> > 00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
> > Host [1039:0761] (rev 01)
> > Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
> > [1734:1099]
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> > ParErr- Stepping- SERR+ FastB2B- DisINTx-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
> > >TAbort- SERR-  > Latency: 64
> > Region 0: Memory at f000 (32-bit, non-prefetchable) 
> > [size=32M]
> > Capabilities: [a0] AGP version 3.0
> > Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
> > HTrans- 64bit- FW- AGP3+ Rate=x4,x8
> >  

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
 Hi Thomas,
 
 On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
  Hi!
  
  On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner han...@cmpxchg.org 
  wrote:
   Each zone that holds userspace pages of one workload must be aged at a
   speed proportional to the zone size.  [...]
  
   Fix this with a very simple round robin allocator.  [...]
  
  This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
  commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
  
  I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
  and it started to exhibit strange issues, which I then bisected to this
  patch.  I'm not saying that the patch is faulty, as it seems to be
  working fine for everyone else, so I rather assume that something in a
  (vastly?) different corner of the kernel (or my hardware?) is broken.
  ;-)
  
  The issue is that when X.org/lightdm starts up, there are garbled
  section on the screen, for example, rectangular boxes that are just black
  or otherwise distorted, and/or sets of glyphs (corresponding to a set
  of characters; but not all characters) are displayed as rectangular gray
  or black boxes, and/or icons in a GNOME session are not displayed
  properly, and so on.  (Can take a snapshot if that helps?)  Switching to
  a Linux console, I can use that one fine.  Switching back to X, in the
  majority of all cases, the screen will be completely black, but with the
  mouse cursor still rendered properly (done in hardware, I assume).
  
  Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
  top of v3.12, and everything is back to normal.  The problem also
  persists with a v3.14 kernel that I just built.
  
  I will try to figure out what's going on, but will gladly take any
  pointers, or suggestions about how to tackle such a problem.
  
  The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
  AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
  using that; instead I put in a Sapphire Radeon HD 4350 card.
 
 I went over this code change repeatedly but I could not see anything
 directly that would explain it.  However, this patch DOES change the
 way allocations are placed (while still respecting zone specifiers
 like __GFP_DMA etc.) and so it's possible that they unearthed a
 corruption, or a wrongly set dma mask in the drivers.
 
 Ccing the radeon driver guys.  Full quote follows.

Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
should be fine even without IOMMU. But given the patch you point to, it
really can only be something that allocate page in place the GPU fails
to access.

Thomas how much memory do you have (again dmes will also provide mapping
informations) ?

My guess is that the pcie bridge can only remap dma page with 32bit dma
mask while the gpu is fine with 40bit dma mask. I always thought that the
pcie/pci code did take care of such thing for us.

Cheers,
Jérôme Glisse

 
  $ cat  /proc/cpuinfo
  processor   : 0
  vendor_id   : AuthenticAMD
  cpu family  : 15
  model   : 47
  model name  : AMD Sempron(tm) Processor 3000+
  stepping: 2
  cpu MHz : 1000.000
  cache size  : 128 KB
  physical id : 0
  siblings: 1
  core id : 0
  cpu cores   : 1
  apicid  : 0
  initial apicid  : 0
  fpu : yes
  fpu_exception   : yes
  cpuid level : 1
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
  mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 
  3dnowext 3dnow rep_good nopl pni lahf_lm
  bogomips: 2000.20
  TLB size: 1024 4K pages
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management: ts fid vid ttp tm stc
  $ sudo lspci -nn -k -vv
  00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
  Host [1039:0761] (rev 01)
  Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
  [1734:1099]
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
  ParErr- Stepping- SERR+ FastB2B- DisINTx-
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
  TAbort- TAbort- MAbort+ SERR- PERR- INTx-
  Latency: 64
  Region 0: Memory at f000 (32-bit, non-prefetchable) 
  [size=32M]
  Capabilities: [a0] AGP version 3.0
  Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- 
  HTrans- 64bit- FW- AGP3+ Rate=x4,x8
  Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- 
  FW- Rate=none
  Capabilities: [d0] HyperTransport: Slave or 

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
 On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
  Hi Thomas,
  
  On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
   Hi!
   
   On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner han...@cmpxchg.org 
   wrote:
Each zone that holds userspace pages of one workload must be aged at a
speed proportional to the zone size.  [...]
   
Fix this with a very simple round robin allocator.  [...]
   
   This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
   commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).
   
   I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
   and it started to exhibit strange issues, which I then bisected to this
   patch.  I'm not saying that the patch is faulty, as it seems to be
   working fine for everyone else, so I rather assume that something in a
   (vastly?) different corner of the kernel (or my hardware?) is broken.
   ;-)
   
   The issue is that when X.org/lightdm starts up, there are garbled
   section on the screen, for example, rectangular boxes that are just black
   or otherwise distorted, and/or sets of glyphs (corresponding to a set
   of characters; but not all characters) are displayed as rectangular gray
   or black boxes, and/or icons in a GNOME session are not displayed
   properly, and so on.  (Can take a snapshot if that helps?)  Switching to
   a Linux console, I can use that one fine.  Switching back to X, in the
   majority of all cases, the screen will be completely black, but with the
   mouse cursor still rendered properly (done in hardware, I assume).
   
   Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example on
   top of v3.12, and everything is back to normal.  The problem also
   persists with a v3.14 kernel that I just built.
   
   I will try to figure out what's going on, but will gladly take any
   pointers, or suggestions about how to tackle such a problem.
   
   The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
   AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
   using that; instead I put in a Sapphire Radeon HD 4350 card.
  
  I went over this code change repeatedly but I could not see anything
  directly that would explain it.  However, this patch DOES change the
  way allocations are placed (while still respecting zone specifiers
  like __GFP_DMA etc.) and so it's possible that they unearthed a
  corruption, or a wrongly set dma mask in the drivers.
  
  Ccing the radeon driver guys.  Full quote follows.
 
 Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
 This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
 should be fine even without IOMMU. But given the patch you point to, it
 really can only be something that allocate page in place the GPU fails
 to access.
 
 Thomas how much memory do you have (again dmes will also provide mapping
 informations) ?
 
 My guess is that the pcie bridge can only remap dma page with 32bit dma
 mask while the gpu is fine with 40bit dma mask. I always thought that the
 pcie/pci code did take care of such thing for us.
 
 Cheers,
 Jérôme Glisse

Forgot to attach patch to test my theory. Does the attached patch fix
the issue ?

 
  
   $ cat  /proc/cpuinfo
   processor   : 0
   vendor_id   : AuthenticAMD
   cpu family  : 15
   model   : 47
   model name  : AMD Sempron(tm) Processor 3000+
   stepping: 2
   cpu MHz : 1000.000
   cache size  : 128 KB
   physical id : 0
   siblings: 1
   core id : 0
   cpu cores   : 1
   apicid  : 0
   initial apicid  : 0
   fpu : yes
   fpu_exception   : yes
   cpuid level : 1
   wp  : yes
   flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
   pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
   fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm
   bogomips: 2000.20
   TLB size: 1024 4K pages
   clflush size: 64
   cache_alignment : 64
   address sizes   : 40 bits physical, 48 bits virtual
   power management: ts fid vid ttp tm stc
   $ sudo lspci -nn -k -vv
   00:00.0 Host bridge [0600]: Silicon Integrated Systems [SiS] 761/M761 
   Host [1039:0761] (rev 01)
   Subsystem: Fujitsu Technology Solutions D2030-A1 Motherboard 
   [1734:1099]
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
   ParErr- Stepping- SERR+ FastB2B- DisINTx-
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
   TAbort- TAbort- MAbort+ SERR- PERR- INTx-
   Latency: 64
   Region 0: Memory at f000 (32-bit, non-prefetchable) 
   [size=32M]
   Capabilities: [a0] AGP version 3.0
   

Re: radeon: screen garbled after page allocator change, was: Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2014-04-25 Thread Jerome Glisse
On Fri, Apr 25, 2014 at 05:50:57PM -0400, Jerome Glisse wrote:
 On Fri, Apr 25, 2014 at 05:47:48PM -0400, Jerome Glisse wrote:
  On Thu, Apr 24, 2014 at 09:37:22AM -0400, Johannes Weiner wrote:
   Hi Thomas,
   
   On Wed, Apr 02, 2014 at 04:26:08PM +0200, Thomas Schwinge wrote:
Hi!

On Fri,  2 Aug 2013 11:37:26 -0400, Johannes Weiner 
han...@cmpxchg.org wrote:
 Each zone that holds userspace pages of one workload must be aged at a
 speed proportional to the zone size.  [...]

 Fix this with a very simple round robin allocator.  [...]

This patch, adding NR_ALLOC_BATCH, eventually landed in mainline as
commit 81c0a2bb515fd4daae8cab64352877480792b515 (2013-09-11).

I recently upgraded a Debian testing system from a 3.11 kernel to 3.12,
and it started to exhibit strange issues, which I then bisected to 
this
patch.  I'm not saying that the patch is faulty, as it seems to be
working fine for everyone else, so I rather assume that something in a
(vastly?) different corner of the kernel (or my hardware?) is broken.
;-)

The issue is that when X.org/lightdm starts up, there are garbled
section on the screen, for example, rectangular boxes that are just 
black
or otherwise distorted, and/or sets of glyphs (corresponding to a set
of characters; but not all characters) are displayed as rectangular gray
or black boxes, and/or icons in a GNOME session are not displayed
properly, and so on.  (Can take a snapshot if that helps?)  Switching to
a Linux console, I can use that one fine.  Switching back to X, in the
majority of all cases, the screen will be completely black, but with the
mouse cursor still rendered properly (done in hardware, I assume).

Reverting commit 81c0a2bb515fd4daae8cab64352877480792b515, for example 
on
top of v3.12, and everything is back to normal.  The problem also
persists with a v3.14 kernel that I just built.

I will try to figure out what's going on, but will gladly take any
pointers, or suggestions about how to tackle such a problem.

The hardware is a Fujitsu Siemens Esprimo E5600, mainboard D2264-A1, CPU
AMD Sempron 3000+.  There is a on-board graphics thingy, but I'm not
using that; instead I put in a Sapphire Radeon HD 4350 card.
   
   I went over this code change repeatedly but I could not see anything
   directly that would explain it.  However, this patch DOES change the
   way allocations are placed (while still respecting zone specifiers
   like __GFP_DMA etc.) and so it's possible that they unearthed a
   corruption, or a wrongly set dma mask in the drivers.
   
   Ccing the radeon driver guys.  Full quote follows.
  
  Can we get a full dmesg, to know if thing like IOMMU are enabled or not.
  This is even more puzzling as rv710 has 40bit dma mask iirc and thus you
  should be fine even without IOMMU. But given the patch you point to, it
  really can only be something that allocate page in place the GPU fails
  to access.
  
  Thomas how much memory do you have (again dmes will also provide mapping
  informations) ?
  
  My guess is that the pcie bridge can only remap dma page with 32bit dma
  mask while the gpu is fine with 40bit dma mask. I always thought that the
  pcie/pci code did take care of such thing for us.
  
  Cheers,
  Jérôme Glisse
 
 Forgot to attach patch to test my theory. Does the attached patch fix
 the issue ?

So this is likely it, the SIS chipset of this motherboard is a freak show.
It support both PCIE and AGP at same time

http://www.newegg.com/Product/Product.aspx?Item=N82E16813185068

Why in hell ?

So my guess is that the root pcie bridge is behind the AGP bridge which
swallow any address  32bit and thus the dma mask of the pcie radeon
card is just believing that we are living in a sane world.

Cheers,
Jérôme Glisse

 
  
   
$ cat  /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 47
model name  : AMD Sempron(tm) Processor 3000+
stepping: 2
cpu MHz : 1000.000
cache size  : 128 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext 
fxsr_opt lm 3dnowext 3dnow rep_good nopl pni lahf_lm
bogomips: 2000.20
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
$ sudo lspci -nn -k -vv