>From what I am reading in this e-mail, you have a mess of DHCPs (3!!?), some Windows machines, 2 LTSP servers, 10 LTSP terms, and some half intelligent switches since you seem to talk about DHCP features implemented in them.

First thing first, I would definitely isolate a segment of the network and make sure you get a true LTSP client(s) vs 1 LTSP Server (Server-->Crossover cable-->Client). You would then add a switch between the basic setup and add terms gradually until you get the lockup. It's then much easier to eliminate the source of the lockups. And since you have 2 servers, you could perform this as the others still use the present infrastructure. Depending on how much time it usually takes to get the lockups, you should isolate the source of the problem quickly and efficiently.

Eric

On Tue, 2004-09-21 at 07:37, Ricardo Araújo wrote:
Well, a little update on the problem.

I couldn't yet run memtest all the way, I had it run once and no problems
showed. I have also experimented changing memory modules (I have 2x512MB)
by letting it run with only 512MB at a time, but the problems remains  
(random lock down).

The situation is now as follows: I decided to split the machines among two
servers. What was once a backup server is now operating as a server for 4
clients that were previously on the main server. I was hoping that that
would put less pressure on servers and things would run smoothly. But now
I have two servers locking down, exactly the same way, but not
concurrently.

So I'd better give an update on the topology of the network, as that might
be important information after all. All 10 LTSP clients are connected
through 100MB/s switches. On the same network there are about 7 machines
still running Windows, no LTSP. A router handles DHCP for the Windows
machines and Internet for those and also for the servers. Two servers runs
LTSP, one providing access to 6 clients and other to 4. Both provide DHCP
for their own clients and get IP from the router (all fixed). One server
also provides a intranet web interface.

Probably the main concerns are the fact that we have 3 DHCP servers
running. The servers get their IP from the router and LTSP clients get IP
  from the servers. No big configuration was made in order to try to
configure the DHCP to respond only to certain machines: the router
responds to everyone, the LTSP servers only to the machines it serves.
First I expected that that would be a problem, since maybe sometimes the
client would get IP from the router and sometimes from the server, but
somehow all clients get IP only from the server, as it is supposed to be.
Even if it did that, it should only cause clients to mal-function, not
servers locking down.

Anyway, the problem now is that BOTH servers locks down randomly. It is
never concurrently, which must say something about the nature of the
problem. Also, I don't believe it is a hacker problem, I tried to switch
off the internet and made sure no spurious connections are being made to
the network (it is easy, since it is possible to have a global view from
the room where all clients are). Running "top" shows no difference in
server load from what should be expected before servers locks down.

It is indeed a tricky problem. If both servers locked concurrently, things
would be easier. But the fact that they don't and lock downs seems
completely random is quite misterious to me.

Thanks for the help so far. I hope we can find a solution and this problem
doesn't remains as one of those LTSP great misteries...

[]s
Ricardo.




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_____________________________________________________________________
Ltsp-discuss mailing list.   To un-subscribe, or change prefs, goto:
      https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help,   try #ltsp channel on irc.freenode.net
--
Eric Thibodeau <[EMAIL PROTECTED]>

Reply via email to