It sounds very much like a hardware problem, perhaps slightly toasted 
ide controllers?  It sounds like a commodity box, can you move all the 
disks to another machine and fire it up?  Oh and go get a decent UPS! :-)

brien

Klaas Vantournhout wrote:
> Dear all,
>
> The real questions are at the bottom, the rest is just a nice intro 
> which introduces you to the nature of the questions.
>
> Two days ago, we had a power outage in our department which caused a 
> rather brutal shutdown of the computers.  All of the computers survived, 
> which is a good thing.  But only one gained a peculiar character, and of 
> course it had to be the backup server.
>
> At the current point I am not blaming BackupPC at all, I'm just trying 
> to isolate the problem, and that is why I would need your help in this.
>
> Okay so what does the bastard (read server) do now.  Well not much, it 
> just hangs or reboots from time to time.  Rather in a random way.
>
> The first thing we noticed was in /var/log/messages that after the 
> poweroutage, the ntpd deamon could not set its clock right anymore.
>
> <snip /var/log/messages>
> # cat /var/log/messages | grep ntpd
> Mar 29 10:39:43 inwtheo1 ntpd: ntpd startup succeeded
> Mar 29 10:39:43 inwtheo1 ntpd[5689]: ntp engine ready
> Mar 29 08:40:06 inwtheo1 ntpd[5689]: peer 157.193.40.37 now valid
> Mar 29 10:40:57 inwtheo1 ntpd[5688]: adjusting local clock by 166.241134s
> Mar 29 10:41:59 inwtheo1 ntpd[5688]: adjusting local clock by 166.240065s
> Mar 29 10:44:13 inwtheo1 ntpd[5688]: adjusting local clock by 166.238681s
> Mar 29 10:45:13 inwtheo1 ntpd[5688]: adjusting local clock by 166.174413s
> Mar 29 10:46:15 inwtheo1 ntpd[5688]: adjusting local clock by 187.903248s
> Mar 29 10:55:11 inwtheo1 ntpd: ntpd startup succeeded
> Mar 29 10:55:11 inwtheo1 ntpd[5607]: ntp engine ready
> Mar 29 08:55:32 inwtheo1 ntpd[5607]: peer 157.193.40.37 now valid
> <end snip>
>
> Although trying to understand this problem, I noticed that changing from 
> openntpd to ntp did the trick to get the time correct.  Although unsure 
> about this solution, we switched off the deamon to be 100% sure this was 
> not the cause of the reboots and or crashes.
>
> init 1 and 2 ran stable (backuppc is not running in init 2).
> init 3 didn't (backuppc runs there)
> starting all services by hand to go from 2 to 3, also did not give any 
> problem.  But using the command #init 3, it does.  If we remove backuppc 
> from init 3, the server is stable.
>
> So at this point we started to suspect something is going on when 
> backuppc is running, but we also noticed that sometimes something was 
> going on when backuppc was not running.  So no conclusion yet.
>
> Although it frequently happens that backuppc initiates the crashes, we 
> are wondering why this could be, that is why i write here.
>
> Our server is very basic.  We are running version 3.0.0, the whole 
> system is located on /dev/hda in several partitions, and the backup 
> config files and data is in raid 5 on 3 separate disks
>
> [EMAIL PROTECTED] ~]$ df
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/hda7             9.9G  1.4G  8.1G  15% /
> /dev/hda1             479M   12M  443M   3% /boot
> /dev/hda8              44G  172M   44G   1% /home
> /dev/hda6              20G  729M   18G   4% /var
> /dev/md0              461G  194G  243G  45% /var/backups
> [EMAIL PROTECTED] ~]$ cat /proc/mdstat
> Personalities : [raid5]
> md0 : active raid5 hdb1[0] hdg1[2] hde1[1]
>        490223232 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
>
> A test also showed that init 3 without backuppc and without /dev/md0 
> mounted, was very stable.
>
> I also have to mention that one time when the system rebooted 
> unexpectedly, the raid system lost 2 of its drives, without a reason. 
> The next bootup just repaired the raid system.  Hence we start thinking 
> something is wrong with the raid.  fsck gives no problems whatsoever.
>
> ** If you skipped the top, here are the questions **
>
> What we are wondering now is, does backuppc initiate some other system
> commands which could enable the hang?
>
> The poweroutage was in the middle of some full backups, is it possible 
> that this gives problems? We have for example in couple client directory 
> a directory new/, even without a backup going on.  Can i safely delete 
> this directory?
>
> Is there more going on that I am not aware of, and how can i see it.
>
> Did anybody had the same?  And if so, how did you solve it?
>
> Regards
> klaas
>
>
>
>   

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to