On 12/8/2012 4:45 PM, John Hupp wrote:
On 12/7/2012 7:07 PM, John Hupp wrote:

On 12/7/2012 2:20 PM, John Hupp wrote:

On 12/7/2012 11:16 AM, John Hupp wrote:

On 12/5/2012 5:48 AM, Valerio Pachera wrote:
2012/12/4 Alkis Georgopoulos<[email protected]>:
I haven't yet seen a single case where NBD compression caused problems.
But I've seen numerous cases where NBD compression made *another*
problem move obvious, due to the data validation it does.
The first thing I've done was testing the ram memory by memtest and it was ok.
I've been testing the client connecting it directly to the eth0 of the
server, so no possible switch issues;
the behavior is the same of the class room.
I also tried to add a simple rtl8129 network card and boot by
etherboot. No changes.
I tried to change video card or force vesa. No significant changes
I tried another pc with the same motherboard and it behaves the same.
I also updated the bios.

To esclude ndb related problem I reverted to nfs, but no changes there either.
   https://help.ubuntu.com/community/UbuntuLTSP/LTSPWithoutNFS

Here is my lspci
----------
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM
Controller (rev 10)
00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express
Integrated Graphics Controller (rev 10)
00:1b.0 Audio device: Intel Corporation NM10/ICH7 Family High
Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express
Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express
Port 2 (rev 01)
00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI
Controller #1 (rev 01)
00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI
Controller #2 (rev 01)
00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI
Controller #3 (rev 01)
00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI
Controller #4 (rev 01)
00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC
Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation NM10/ICH7 Family SATA
Controller [IDE mode] (rev 01)
00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01)
01:00.0 Ethernet controller: Atheros Communications Inc.
AR8121/AR8113/AR8114 Gigabit or Fast Ethernet (rev b0)
-----------

Regarding what seems like the same LTSP client problem that I'm still troubleshooting here: One of the PC's that fails to boot as an LTSP client also has enough resources to boot the Lubuntu 12.10 Live CD, so I tried that, and I found that it fails to boot with the same stalled blank, black screen after the Lubuntu splash screen. I left it like that for ~ 15 minutes to check for an 8-minute stall (which another user reported elsewhere), and it didn't budge.

Since you are troubleshooting what seems like the same behavior with LTSP clients on Edu/Ubuntu 12.04 servers, and since I show the same behavior also on a Lubuntu 12.10 Live CD, I now wonder if this is a *buntu 12.04/12.10 problem related to certain chipsets or video chips. Complicating that observation somewhat, I note this machine that I just ran the Live CD on had run both Lubuntu and Ubuntu 12.04 without such a behavior.


To give my supposition a bit more weight, I removed the discrete video card from a working Lubuntu 12.10 machine and installed it in the machine that fails both as a client and a standalone-with-Live-CD. It then successfully booted the Live CD.

The card that worked was an old PCI Matrox Millennium II MGA 2164W.

The card that failed was a slightly newer AGP card with a Trident 3DImage 9850 chip.

An expanded list of what works and doesn't work for me (not double-tested, and where not specified I'm listing chips rather than card mfr/model):

Worked:
- PCI Matrox Millennium II MGA 2164W (PCI card)
- Intel 82810E (integrated)
- An ATI-based AGP card sold as a MIC E-G012-02-1214(B)
- ATI Rage 128 Pro (AGP card)
- ATI Rage 128 (AGP card)
- HP VectaVL PC w/ integrated video, probably either:
        Matrox G250 2X AGP
        Matrox Millennium G400 4X AGP

Partially Worked (got past the blank, black screen but then failed somehow):
- Diamond STL III S540 XTRM32M 82 (AGP card)
- Diamond Spdstr A50 with SiS 6326AGP chip (AGP card)

Failed:
- Trident 3DImage 9850 (AGP card)
- S3 Trio64V+ (PCI card)
- Diamond Viper with Power Weitek 9000/9001 and Oak Technology T9351 chips (PCI card)
- ATI Mach64 (PCI card)
- eMachines eTower 500i w/ integrated video, probably:
        ATI Rage Pro Turbo 2X AGP
I have been continuing to work the angle that this is a video driver problem. New observations:

I already have the Xorg meta-package installed (xserver-xorg-video-all) that installs their whole suite of drivers.

I installed the linux-firmware-nonfree package, rebooted, and re-tested the non-working hardware. No change in results.

There are other proprietary binary drivers available for some video cards. E.g. for ATI, there is fglrx and fglrx-updates. So those are an option, though I think I read somewhere about complications uninstalling those when they don't work.

I'm now wondering how to troubleshoot xorg on the client. On the server, I have /var/log/Xorg.0.log, and though I don't know how to read that very well, I can tell that it selects a certain set of drivers to try to load, and provides various kinds of information about how they loaded (or didn't) and with what settings. But with the LTSP clients, though I have syslog messages being forwarded from the clients to the server, that does not include the xorg log messages, right? If that's so, then I would want to ssh in to /var/log/Xorg.0.log on the client, but my recollection is that ssh to the client fails with these stalled startups. I want to double-check that.

Can I force usage of a certain driver? I read that this was supported with xorg.conf, but that does not exist by default now, and Xorg uses an automated configuration system called KMS (Kernel Mode Setting), about which I know next to nothing. But perhaps a configuration via xorg.conf is still respected/supported?

-----------------

In any case, after running lspci and lshw with several setups, I now have a better identification of some of the hardware:

Worked:
- PCI Matrox Millennium II MGA 2164W (PCI card)
- Intel 82810E (integrated)
- ATI RV200 QW [Radeon 7500]
- ATI Rage 128 Pro AGP
- ATI Rage 128 RF/SG AGP
- HP VectaVL PC w/ integrated video, probably either:
        Matrox G250 2X AGP
        Matrox Millennium G400 4X AGP

Partially Worked (got past the blank, black screen but then failed):
- Diamond STL III S540 XTRM32M 82 (AGP card)
- Diamond Speedstar A50 with SiS 6326AGP chip (AGP card)

Failed:
- Trident 3DImage 9850 (AGP card)
- S3 Trio64V+ (PCI card)
- Diamond Viper with Power Weitek 9000/9001 and Oak Technology T9351 chips (PCI card)
- ATI 210888GX [Mach64 GX] (PCI card)
- ATI 3D Rage Pro AGP 1x/2x
- Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) [from your lspci]


On 12/11/2012 12:20 PM, John Hupp wrote:
My understanding of this problem has evolved some, so I am keeping the topic but writing afresh on the description.

It now appears that here, at any rate, there are two groups of problems. For one group, the client's /var/log/Xorg.?.log makes it clear that there is a video driver problem, and I am putting that group aside for the moment.

For the other group, the client's /var/log/Xorg.?.log looks normal, but during startup after the splash screen it never reaches the login screen, and instead there is a cycling between blank and black screens. And with client syslog messages forwarded to the server, there is a loop of NBD messages in the server syslog, initially mixed with NTP/D messages, but eventually settling down to just the loop of NBD messages (see toward the bottom):

(snipped normal-looking part of server syslog ...)
Dec 11 09:57:00 ltsp137 modem-manager[1087]: <info> Successfully loaded 20 plugins Dec 11 09:57:04 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:04 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:04 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:04 Lubuntu1 nbd_server[2238]: Starting to serve
Dec 11 09:57:04 Lubuntu1 nbd_server[2238]: Size of exported file/device is 923074560 Dec 11 09:57:06 ltsp137 kernel: [ 31.545915] nbd9: unknown partition table
Dec 11 09:57:04 Lubuntu1 nbd_server[2238]: Disconnect request received.
Dec 11 09:57:04 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:06 ltsp137 kernel: [   31.552578] block nbd9: NBD_DISCONNECT
Dec 11 09:57:06 ltsp137 kernel: [ 31.553503] block nbd9: Receive control failed (result -32)
Dec 11 09:57:06 ltsp137 kernel: [   31.553592] block nbd9: queue cleared
Dec 11 09:57:07 ltsp137 ntpdate[890]: step time server 128.138.140.44 offset -1.585937 sec Dec 11 09:57:07 ltsp137 ntpd[1216]: ntpd [email protected] Mon Aug 20 14:49:15 UTC 2012 (1)
Dec 11 09:57:07 ltsp137 ntpd[1217]: proto: precision = 0.503 usec
Dec 11 09:57:07 ltsp137 ntpd[1217]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen and drop on 1 v6wildcard :: UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 2 lo 127.0.0.1 UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 3 eth0 192.168.1.137 UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 4 eth0 fdd9:1d03:df5:0:250:4ff:feb0:ccfe UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 5 eth0 fe80::250:4ff:feb0:ccfe UDP 123 Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 6 eth0 fdd9:1d03:df5:0:4dd4:a667:5962:d43f UDP 123
Dec 11 09:57:07 ltsp137 ntpd[1217]: Listen normally on 7 lo ::1 UDP 123
Dec 11 09:57:07 ltsp137 ntpd[1217]: peers refreshed
Dec 11 09:57:07 ltsp137 ntpd[1217]: Listening on routing socket on fd #24 for interface updates
Dec 11 09:57:12 ltsp137 ntpd[1217]: ntpd exiting on signal 15
Dec 11 09:57:13 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:13 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:13 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:13 Lubuntu1 nbd_server[2239]: Starting to serve
Dec 11 09:57:13 Lubuntu1 nbd_server[2239]: Size of exported file/device is 923074560 Dec 11 09:57:13 ltsp137 kernel: [ 40.306413] nbd9: unknown partition table
Dec 11 09:57:13 Lubuntu1 nbd_server[2239]: Disconnect request received.
Dec 11 09:57:13 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:13 ltsp137 kernel: [   40.313194] block nbd9: NBD_DISCONNECT
Dec 11 09:57:13 ltsp137 kernel: [ 40.314074] block nbd9: Receive control failed (result -32)
Dec 11 09:57:13 ltsp137 kernel: [   40.314148] block nbd9: queue cleared
Dec 11 09:57:17 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:17 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:17 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:17 Lubuntu1 nbd_server[2240]: Starting to serve
Dec 11 09:57:17 Lubuntu1 nbd_server[2240]: Size of exported file/device is 923074560 Dec 11 09:57:17 ltsp137 kernel: [ 44.066204] nbd9: unknown partition table
Dec 11 09:57:17 Lubuntu1 nbd_server[2240]: Disconnect request received.
Dec 11 09:57:17 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:17 ltsp137 kernel: [   44.072999] block nbd9: NBD_DISCONNECT
Dec 11 09:57:17 ltsp137 kernel: [ 44.073591] block nbd9: Receive control failed (result -32)
Dec 11 09:57:17 ltsp137 kernel: [   44.073644] block nbd9: queue cleared
Dec 11 09:57:21 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:21 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:21 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:21 Lubuntu1 nbd_server[2241]: Starting to serve
Dec 11 09:57:21 Lubuntu1 nbd_server[2241]: Size of exported file/device is 923074560 Dec 11 09:57:21 ltsp137 kernel: [ 47.831221] nbd9: unknown partition table
Dec 11 09:57:21 Lubuntu1 nbd_server[2241]: Disconnect request received.
Dec 11 09:57:21 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:21 ltsp137 kernel: [   47.837973] block nbd9: NBD_DISCONNECT
Dec 11 09:57:21 ltsp137 kernel: [ 47.838709] block nbd9: Receive control failed (result -32) Dec 11 09:57:21 ltsp137 kernel: [ 47.838800] block nbd9: shutting down socket
Dec 11 09:57:21 ltsp137 kernel: [   47.838819] block nbd9: queue cleared
Dec 11 09:57:23 ltsp137 ntpdate[1301]: adjust time server 208.53.158.34 offset 0.001939 sec Dec 11 09:57:23 ltsp137 ntpd[1633]: ntpd [email protected] Mon Aug 20 14:49:15 UTC 2012 (1)
Dec 11 09:57:23 ltsp137 ntpd[1634]: proto: precision = 0.505 usec
Dec 11 09:57:23 ltsp137 ntpd[1634]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen and drop on 1 v6wildcard :: UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 2 lo 127.0.0.1 UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 3 eth0 192.168.1.137 UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 4 eth0 fdd9:1d03:df5:0:250:4ff:feb0:ccfe UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 5 eth0 fe80::250:4ff:feb0:ccfe UDP 123 Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 6 eth0 fdd9:1d03:df5:0:4dd4:a667:5962:d43f UDP 123
Dec 11 09:57:23 ltsp137 ntpd[1634]: Listen normally on 7 lo ::1 UDP 123
Dec 11 09:57:23 ltsp137 ntpd[1634]: peers refreshed
Dec 11 09:57:23 ltsp137 ntpd[1634]: Listening on routing socket on fd #24 for interface updates Dec 11 09:57:24 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:24 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:24 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:24 Lubuntu1 nbd_server[2242]: Starting to serve
Dec 11 09:57:24 Lubuntu1 nbd_server[2242]: Size of exported file/device is 923074560 Dec 11 09:57:24 ltsp137 kernel: [ 51.675584] nbd9: unknown partition table
Dec 11 09:57:24 Lubuntu1 nbd_server[2242]: Disconnect request received.
Dec 11 09:57:24 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:24 ltsp137 kernel: [   51.682372] block nbd9: NBD_DISCONNECT
Dec 11 09:57:24 ltsp137 kernel: [ 51.683392] block nbd9: Receive control failed (result -32)
Dec 11 09:57:24 ltsp137 kernel: [   51.683470] block nbd9: queue cleared
Dec 11 09:57:28 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:28 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:28 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:28 Lubuntu1 nbd_server[2243]: Starting to serve
Dec 11 09:57:28 Lubuntu1 nbd_server[2243]: Size of exported file/device is 923074560 Dec 11 09:57:28 ltsp137 kernel: [ 55.424453] nbd9: unknown partition table
Dec 11 09:57:28 Lubuntu1 nbd_server[2243]: Disconnect request received.
Dec 11 09:57:28 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:28 ltsp137 kernel: [   55.438244] block nbd9: NBD_DISCONNECT
Dec 11 09:57:28 ltsp137 kernel: [ 55.438792] block nbd9: Receive control failed (result -32)
Dec 11 09:57:28 ltsp137 kernel: [   55.438844] block nbd9: queue cleared
Dec 11 09:57:32 Lubuntu1 nbd_server[1685]: connect from 192.168.1.137, assigned file is /opt/ltsp/images/i386.img Dec 11 09:57:32 Lubuntu1 nbd_server[1685]: Can't open authorization file /etc/ltsp/nbd-server.allow (No such file or directory).
Dec 11 09:57:32 Lubuntu1 nbd_server[1685]: Authorized client
Dec 11 09:57:32 Lubuntu1 nbd_server[2245]: Starting to serve
Dec 11 09:57:32 Lubuntu1 nbd_server[2245]: Size of exported file/device is 923074560 Dec 11 09:57:32 ltsp137 kernel: [ 59.196619] nbd9: unknown partition table
Dec 11 09:57:32 Lubuntu1 nbd_server[2245]: Disconnect request received.
Dec 11 09:57:32 Lubuntu1 nbd_server[1685]: Child exited with 0
Dec 11 09:57:32 ltsp137 kernel: [   59.203549] block nbd9: NBD_DISCONNECT
Dec 11 09:57:32 ltsp137 kernel: [ 59.204048] block nbd9: Receive control failed (result -32)
Dec 11 09:57:32 ltsp137 kernel: [   59.204100] block nbd9: queue cleared

I read in an IRC archive that the nbd9 messages are somehow normal: "nbd9 disconnection is normal, it's just a way to check for a newer chroot image, in order for the clients to automatically reboot." I don't quite know what that means, other than it indicates that broken nbd/server is not the problem here.

In at least one case with a certain video chipset, that cycling block of messages has this message interspersed just before many of the "nbd9: unknown partition table" messages: Dec 10 20:28:10 ltsp138 kernel: [ 82.844422] ldm[4166]: segfault at 8 ip 0804baf4 sp bf8ab830 error 4 in ldm[8048000+6000] I think for that case, the black screen shows a spinning progress screen pointer for a moment before it goes to the blank screen.

Since the problem goes away with known-working video chipsets, the startup failure seems video related, but I have not been able to pin down the failure mechanism.

If I don't get more pointed advice in the meantime, I think I'm going to work on learning how to use xorg.conf to force usage of perhaps the vesa driver. And if that works, then try to troubleshoot the primary intended driver. My working theory is that, despite the good-looking xorg log, this is still a video driver problem.

With a very tenuous understanding of xorg workings, I created the directory /etc/X11/xorg.conf.d, and in that, a file named xorg_ltsp.conf (must end with .conf). Contents:

Section "Device"
    Identifier  "Card2"
    Driver      "vesa"
    BusID       "PCI:0:8:0"
EndSection

The default Identifier "Card0" might have worked also, but I was fuzzy about bad interactions with video card configuration on the server.

And rightly so, it seems. The server had been using the ATI driver, but now it is using the FBDEV driver. This despite the fact that the above Device section specifies a BusID that does not match the BusID of the server's video card.

But apart from that, in the client an ATI Mach64 card that had stalled with cycling black/blank screens now proceeds to the login and then to a normal desktop.

So I confirmed the theory that the behavior is related to some driver issue despite a normal-looking client Xorg log, but I don't yet know how to best solve the client problem without affecting the server. And I would prefer to solve the client problem by making the more capable driver work (here, the ATI driver).


-- 
edubuntu-users mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/edubuntu-users

Reply via email to