Re: [SeaBIOS] UHCI on US15W: Crazy stuff happening

2012-08-07 Thread Matthew Millman
Got it:

This line in usb-uhci.c (reset_uhci()) broke it:

pci_config_writew(bdf, USBLEGSUP, USBLEGSUP_RWC);

According to the US15W datasheet, there is no register at this offset,
there does seem to be something there because it reads 0x0400 after that
write, but I don't think it's what SeaBIOS is thinking it is...

USB Keyboard is working. Woohoo!

On Mon, Aug 6, 2012 at 4:08 PM, Matthew Millman inax...@gmail.com wrote:

 Thanks for the response Kevin

 I've spent another day trying to get to the bottom of this one, still no
 luck.

 Attempt to 16 byte align: No difference.

 Errata: One point mentioned with regards to UHCI but its affect would be a
 complete disabling of the controller, not this _sort of_ working issue I
 see, and, coreboot already has code which applies the workaround.

 Registers: Nothing much interesting here either

 Before:
 Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 2ab USBSOF: 40
 USBFLBASEAD: eaac USBPORTSC1: 1a7 USBPORTSC2: 80

 After:
 Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 4b3 USBSOF: 40
 USBFLBASEAD: e2cc USBPORTSC1: 1a7 USBPORTSC2: 80

 The more I look the more I find out _OK_ everything is. I did try another
 experiment which had even more interesting results, I tacked some more data
 onto the end of the setup packet (a5 a5 a5 a5 33 33 33 33 66 66 66 66) -
 what did it do? It made an exact copy of it immediately after the first
 one, it always does this regardless of how much data is being sent, and
 always writes a 'mystery' 12 bytes after that.

 before:

 1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
 1fbc44c0: 66 66 66 66 ff ff ff ff ff ff ff ff ff ff ff ff
 1fbc44d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 1fbc44e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

 after:

 1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
 1fbc44c0: 66 66 66 66 00 05 01 00 00 00 00 00 a5 a5 a5 a5
 1fbc44d0: 33 33 33 33 66 66 66 66 fd 03 00 00 00 00 00 00
 1fbc44e0: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff

 I think the setting of the ActLen to 7ff is just its way of zeroing the
 count when the transaction begins, perhaps it never gets incremented
 because the process which is meant to fetch the memory is broken.

 Current theory: Something is broken in the DMA process. I can't really see
 how a UHCI controller could ever do something this crazy.

 Next step, assuming no one has any ideas, start examining why Linux works
 OK, I had a quick look for quirks but couldn't spot anything, but
 realistically, that is going to be a significant effort to see through.

 Cheers
 Matt

 On Sun, Aug 5, 2012 at 4:44 PM, Kevin O'Connor ke...@koconnor.net wrote:

 On Sun, Aug 05, 2012 at 12:21:13PM +0100, Matthew Millman wrote:
  Hi
 
  I'm seeing a rather interesting problem with UHCI on Intel US15W and
  wondered if anyone else had seen anything like this before. I noticed it
  when I plugged in a USB keyboard, which caused a crash due to something
  corrupting the stack? it turns out that the stack has been trashed by
 the
  UHCI controller via DMA?!
 
  When trying to transmit the 8 byte address setup packet, the hardware
  doesn't quite seem to be doing as it's told. SeaBIOS sets up the UHCI
 TDs
  exactly as per the spec - no problems there,
 
  Once the QH element is set, instead of transmitting the 8 bytes as
  described in the TD, it transmits a full 1023 bytes? (according to the
  returned TD) UHCI then goes ahead and overwrites another 35 bytes beyond
  the end of the buffer pointed to by the TD.
 
  Here's the 8 bytes of the setup packet (I've set everything after it to
  0xFF):
 
  1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
  1fbc1fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fd0: ff ff ff ff ff
 
  Here it is after the UHCI controller has been at it. The only code to
  execute between these two dumps is this:
 
  pipe-qh.element = (u32)tds[0]; (in uhci_control())
 
  1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
  1fbc1fa0: bf 00 05 01 00 00 00 00 00 ff ff ff fd 03 00 00
  1fbc1fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fd0: ff ff ff ff ff
 
  TD Chain before:
  1fbc4870: 84 48 bc 1f 00 00 80 1c 2d 00 e0 00 95 1f bc 1f
  1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00
 
  TD Chain after:
  1fbc4870: 84 48 bc 1f ff 07 80 1c 2d 00 e0 00 95 1f bc 1f
  1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00

 My read of the spec says an actlen=0x07ff means a null transfer (not
 1023 bytes).  However, given that the status is still active I don't
 think it really matters what's in the td.

  I'm wondering if I'm not the first person to have seen this. The problem
  (without detailed debugging) manifests its self exactly as described in
  this message:

 I haven't seen this type of report before.  A couple of things you
 could 

Re: [SeaBIOS] UHCI on US15W: Crazy stuff happening

2012-08-07 Thread Peter Stuge
Matthew Millman wrote:
 This line in usb-uhci.c (reset_uhci()) broke it:
 
 pci_config_writew(bdf, USBLEGSUP, USBLEGSUP_RWC);
 
 According to the US15W datasheet, there is no register at this offset,
 there does seem to be something there because it reads 0x0400 after that
 write, but I don't think it's what SeaBIOS is thinking it is...
 
 USB Keyboard is working. Woohoo!

Yay! Please send a patch. It may not go in as-is, but it's a good
help for looking deeper into the issue.


//Peter

___
SeaBIOS mailing list
SeaBIOS@seabios.org
http://www.seabios.org/mailman/listinfo/seabios


Re: [SeaBIOS] UHCI on US15W: Crazy stuff happening

2012-08-06 Thread Matthew Millman
Thanks for the response Kevin

I've spent another day trying to get to the bottom of this one, still no
luck.

Attempt to 16 byte align: No difference.

Errata: One point mentioned with regards to UHCI but its affect would be a
complete disabling of the controller, not this _sort of_ working issue I
see, and, coreboot already has code which applies the workaround.

Registers: Nothing much interesting here either

Before:
Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 2ab USBSOF: 40 USBFLBASEAD:
eaac USBPORTSC1: 1a7 USBPORTSC2: 80

After:
Regs: USBCMD: c1 USBSTS: 0 USBINTR: 0 USBFRNUM: 4b3 USBSOF: 40 USBFLBASEAD:
e2cc USBPORTSC1: 1a7 USBPORTSC2: 80

The more I look the more I find out _OK_ everything is. I did try another
experiment which had even more interesting results, I tacked some more data
onto the end of the setup packet (a5 a5 a5 a5 33 33 33 33 66 66 66 66) -
what did it do? It made an exact copy of it immediately after the first
one, it always does this regardless of how much data is being sent, and
always writes a 'mystery' 12 bytes after that.

before:

1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
1fbc44c0: 66 66 66 66 ff ff ff ff ff ff ff ff ff ff ff ff
1fbc44d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc44e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

after:

1fbc44b0: 00 05 01 00 00 00 00 00 a5 a5 a5 a5 33 33 33 33
1fbc44c0: 66 66 66 66 00 05 01 00 00 00 00 00 a5 a5 a5 a5
1fbc44d0: 33 33 33 33 66 66 66 66 fd 03 00 00 00 00 00 00
1fbc44e0: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff

I think the setting of the ActLen to 7ff is just its way of zeroing the
count when the transaction begins, perhaps it never gets incremented
because the process which is meant to fetch the memory is broken.

Current theory: Something is broken in the DMA process. I can't really see
how a UHCI controller could ever do something this crazy.

Next step, assuming no one has any ideas, start examining why Linux works
OK, I had a quick look for quirks but couldn't spot anything, but
realistically, that is going to be a significant effort to see through.

Cheers
Matt

On Sun, Aug 5, 2012 at 4:44 PM, Kevin O'Connor ke...@koconnor.net wrote:

 On Sun, Aug 05, 2012 at 12:21:13PM +0100, Matthew Millman wrote:
  Hi
 
  I'm seeing a rather interesting problem with UHCI on Intel US15W and
  wondered if anyone else had seen anything like this before. I noticed it
  when I plugged in a USB keyboard, which caused a crash due to something
  corrupting the stack? it turns out that the stack has been trashed by the
  UHCI controller via DMA?!
 
  When trying to transmit the 8 byte address setup packet, the hardware
  doesn't quite seem to be doing as it's told. SeaBIOS sets up the UHCI TDs
  exactly as per the spec - no problems there,
 
  Once the QH element is set, instead of transmitting the 8 bytes as
  described in the TD, it transmits a full 1023 bytes? (according to the
  returned TD) UHCI then goes ahead and overwrites another 35 bytes beyond
  the end of the buffer pointed to by the TD.
 
  Here's the 8 bytes of the setup packet (I've set everything after it to
  0xFF):
 
  1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
  1fbc1fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fd0: ff ff ff ff ff
 
  Here it is after the UHCI controller has been at it. The only code to
  execute between these two dumps is this:
 
  pipe-qh.element = (u32)tds[0]; (in uhci_control())
 
  1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
  1fbc1fa0: bf 00 05 01 00 00 00 00 00 ff ff ff fd 03 00 00
  1fbc1fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  1fbc1fd0: ff ff ff ff ff
 
  TD Chain before:
  1fbc4870: 84 48 bc 1f 00 00 80 1c 2d 00 e0 00 95 1f bc 1f
  1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00
 
  TD Chain after:
  1fbc4870: 84 48 bc 1f ff 07 80 1c 2d 00 e0 00 95 1f bc 1f
  1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00

 My read of the spec says an actlen=0x07ff means a null transfer (not
 1023 bytes).  However, given that the status is still active I don't
 think it really matters what's in the td.

  I'm wondering if I'm not the first person to have seen this. The problem
  (without detailed debugging) manifests its self exactly as described in
  this message:

 I haven't seen this type of report before.  A couple of things you
 could try: dump the USB controller registers as well (the controller
 may have shutdown for a different reason), check to see if any other
 transfer attempted to use 0x1fbc1fa0 in the past (perhaps the
 controller has something stale cached), look for an errata for the
 chipset, look through the linux code for the chipset to see if it is
 working about something, try aligning the setup packet buffer to 16
 bytes.

 -Kevin

___

[SeaBIOS] UHCI on US15W: Crazy stuff happening

2012-08-05 Thread Matthew Millman
Hi

I'm seeing a rather interesting problem with UHCI on Intel US15W and
wondered if anyone else had seen anything like this before. I noticed it
when I plugged in a USB keyboard, which caused a crash due to something
corrupting the stack? it turns out that the stack has been trashed by the
UHCI controller via DMA?!

When trying to transmit the 8 byte address setup packet, the hardware
doesn't quite seem to be doing as it's told. SeaBIOS sets up the UHCI TDs
exactly as per the spec - no problems there,

Once the QH element is set, instead of transmitting the 8 bytes as
described in the TD, it transmits a full 1023 bytes? (according to the
returned TD) UHCI then goes ahead and overwrites another 35 bytes beyond
the end of the buffer pointed to by the TD.

Here's the 8 bytes of the setup packet (I've set everything after it to
0xFF):

1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
1fbc1fa0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc1fb0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc1fd0: ff ff ff ff ff

Here it is after the UHCI controller has been at it. The only code to
execute between these two dumps is this:

pipe-qh.element = (u32)tds[0]; (in uhci_control())

1fbc1f95: 00 05 01 00 00 00 00 00 ff ff ff
1fbc1fa0: bf 00 05 01 00 00 00 00 00 ff ff ff fd 03 00 00
1fbc1fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1fbc1fc0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
1fbc1fd0: ff ff ff ff ff

TD Chain before:
1fbc4870: 84 48 bc 1f 00 00 80 1c 2d 00 e0 00 95 1f bc 1f
1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00

TD Chain after:
1fbc4870: 84 48 bc 1f ff 07 80 1c 2d 00 e0 00 95 1f bc 1f
1fbc4880: 01 00 00 00 00 00 80 04 69 00 e8 ff 00 00 00 00


I'm wondering if I'm not the first person to have seen this. The problem
(without detailed debugging) manifests its self exactly as described in
this message:

http://comments.gmane.org/gmane.linux.bios/55336

Thanks!
Matt
___
SeaBIOS mailing list
SeaBIOS@seabios.org
http://www.seabios.org/mailman/listinfo/seabios