Re: [CentOS] system hangs

2014-02-26 Thread Richard Karhuse
Here are some suggestions:

1.  Enable and configure kdump

2.  Enable Magic SysRq

3.  Consider enabling kernel.softlockup_panic and vm.panic_on_oom,
  but doing so will cause you server to crash sooner than it would
  normally  -- it depends upon whether you want to capture the first
  instance (e.g. smoking gun) or that you want to wait until the system
  is completely hosed (and may have more evidence of the issue).

Then test and verify that Magic SysRq can be used to generate a
kernel core dump.

Then, sit back and wait .

I do this on all my production servers -- saving the pain of having
to do this under pressure plus capturing the vmcore on the first
instance is very much worth the effort 

HTH

-rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Does CentOS support dual graphics cards with 2 monitors each?

2013-04-07 Thread Richard Karhuse
I've got a triple-head set-up running where 1 monitor is off
the internal Intel HD-4000 GPU and 2x monitors are off a GT550-Ti
using the nVidia drivers.  I could not get xrandr support to work
(and attributed that to Intel / nVidia not co-operating).  I found that
using the nVidia xserver setting GUI and hand-editing the xorg.conf
file to be the best solution.  I have similar (but slightly) different
set-ups working under Fedora-17 and CentOS-6.  All 3x screens are
in on large canvass (so screens can be drag from any monitor to any
where on the canvass, e.g., xinerama) across 2x X-servers -- so full
screen either occupies the 1x Dell 24 or the 2x E-Bay 27 specials.
The key I found is locking things in with the BusID w/PCI designation
settings.

HTH

   -rak-

Here is the xorg.conf -- in case that helps:

# nvidia-settings: X configuration file generated by nvidia-settings
# nvidia-settings:  version 304.37  (mockbuild@)  Tue Aug 14 06:30:17 CEST 2012


Section ServerLayout
Identifier Layout0
Screen  0  Screen0 0 0
Screen1  Screen1 RightOf Screen0
InputDeviceKeyboard0 CoreKeyboard
InputDeviceMouse0 CorePointer
Option Xinerama 1
EndSection

Section Files
EndSection

Section InputDevice

# generated from default
Identifier Mouse0
Driver mouse
Option Protocol auto
Option Device /dev/input/mice
Option Emulate3Buttons no
Option ZAxisMapping 4 5
EndSection

Section InputDevice

# generated from data in /etc/sysconfig/keyboard
Identifier Keyboard0
Driver keyboard
Option XkbLayout us
Option XkbModel pc105
EndSection

Section Monitor

# HorizSync source: edid, VertRefresh source: edid
Identifier Monitor0
VendorName Unknown
ModelName  FRT DIGITAL
HorizSync   30.0 - 91.0
VertRefresh 56.0 - 61.0
Option DPMS
EndSection

Section Monitor
IdentifierMonitor1
VendorName Dell
ModelName  Dell 2405FPW
HorizSync   30.0 - 81.0
VertRefresh 56.0 - 76.0
Option DPMS
EndSection

Section Device
Identifier Device0
Driver nvidia
VendorName NVIDIA Corporation
BoardName  GeForce GTX 550 Ti
BusID   PCI:1:0:0
EndSection

Section Device
Identifier Device1
Driverintel
VendorName intel
BoardName  intel
BusIDPCI:0:2:0
Optionmonitor-HDMI2 Monitor1
EndSection

Section Screen
Identifier Screen1
Device Device0
MonitorMonitor0
DefaultDepth24
Option Stereo 0
Option nvidiaXineramaInfoOrder DFP-0
Option metamodes DFP-0: 2560x1440 +0+0, DFP-2:
2560x1440 +2560+0; DFP-0: nvidia-auto-select +0+0, DFP-2:
nvidia-auto-select +2560+0
#Option metamodes DFP-0: 2560x1440 +0+0; DFP-0:
nvidia-auto-select +0+0
SubSection Display
Depth   24
EndSubSection
EndSection


Section Screen
Identifier Screen0
DeviceDevice1
MonitorMonitor1
DefaultDepth 24
Option  metamodes HDMI1: 1920x1200 +0+0
SubSectionDisplay
   Depth 24
EndSubSection
EndSection
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Random Proliant Crashes CentOS 6.1

2011-12-18 Thread Richard Karhuse
If you follow the cited bugzilla's, you'll see that you *must* upgrade
your HP firmware too (for everything(!!) -- particularly RAID controllers
and SAS expander, etc.) -- to the absolute latest release.  [Note: the
updates on the 9.30 ISO are *not* late enough, btw.]  Then, you need
the latest version of the kernel that has a work-around in the cciss / hpsa
driver.

HTH

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Random Proliant Crashes CentOS 6.1

2011-12-18 Thread Richard Karhuse
On Sun, Dec 18, 2011 at 3:21 PM, John Hinton webmas...@ew3d.com wrote:

 On 12/18/2011 2:22 PM, Richard Karhuse wrote:
  If you follow the cited bugzilla's, you'll see that you *must* upgrade
  your HP firmware too (for everything(!!) -- particularly RAID controllers
  and SAS expander, etc.) --  to the absolute latest release.  [Note: the
  updates on the 9.30 ISO are *not* late enough, btw.]  Then, you need
  the latest version of the kernel that has a work-around in the cciss /
 hpsa
  driver.
 
  HTH
 
  -rak-
 
 Thanks. I have already started down the firmware path. This is
 irritating! 15 years of solid reliability out of Proliant products and
 then suddenly this! :( I'm starting to wonder if the Linux kernel is
 just trying to do too many things... geez... (Isn't that what Windows
 does?) Maybe there is a need for a server kernel which could be a
 simplified version of a desktop or full kernel? Then again, I have no
 insight into what led to this... perhaps it was introduced due to the
 server side features.


The problem is *not* the linux kernel -- it's HP firmware.  Look @ the
kernel changes and you'll see where it is working around HP FW.

Note:  Some of the firmware upgrades *require* that the box and disks/
MSA's be power cycled (as in you must pull the power cord!) for the FW
upgrade to take effect.  If you don't do that the new FW isn't what's being
used ... (but, then, I assume most folks realise that about FW upgrades...)



 So, by latest kernel, I suppose that would not be the latest CentOS
 6.1 kernel? If not, does anyone know if it is in any kernel provided by
 upstream and if it will soon be available under CentOS? For instance 6.2
 that seems to be just around the corner?


The latest kernel in the channel should have the fix (aka work-around)
in it.  Of course, it is not effective unless the corresponding FW patch is
also been applied.  You have to be very diligent and find the FW's on the
HP site and get the very latest.  Not sure about G4's, but on G6's, the
motherboard FW upgrade was also important too (and is not part of 9.30).


 Upstream seemed to blame it on their upstream, or the kernel. The cases
 I found were closed in spite of no good resolution. There has to be a
 ton of Proliant stuff out there. Actually, HP seems to have a lot of
 holes in providing for RH6 and has only RH5 for many of these firmware
 updates. I did successfully run HP RH5 firmware updates on a RH6 box,
 but I'm not so happy about taking chances like that.

 Or worse perhaps we are starting to see a degradation due to
 ownership by HP vs. the fine products that Compaq created? I certainly
 hope not!

 Meanwhile, I guess I'll sit back and wait to see if what I have done is
 enough.

 --
 John Hinton



HTH.

  -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 5.4 off-center on SuperMicro console

2010-04-29 Thread Richard Karhuse
On Thu, Apr 29, 2010 at 5:34 PM, Rudi Ahlers rudiahl...@gmail.com wrote:



 On Thu, Apr 29, 2010 at 11:24 PM, m.r...@5-cent.us wrote:

  I have a weird situation with a new installation of CentOS 5.4 x64, on a
  SuperMicro X7SBI server. The server has a a href=
  http://www.supermicro.com/products/motherboard/Xeon3000/3210/X7SBi.cfm;
  target=_newSuperMicro X7SBi motherboard/a,  board ATI ES1000 and
  Core2Quad Q9505 CPU. The kernel is


Just as another data point, I'm running CentOS 5.4 on hundreds of systems
that have the E version of the MB (e.g., X7SBE) without any problem.

Check to see that you have the lastest (R1.3e??) BIOS on your MB.

  -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Creating an alternitive install CD for CentOS 5.2 (w/ patched mkinitrd)

2009-03-08 Thread Richard Karhuse
On Sat, Mar 7, 2009 at 2:34 PM, Robert Heller hel...@deepsoft.com wrote:
 .

 I was unsure of the specific options above -- After installing revisor
 and poked around in the source code and found what I needed, but my test
 install failed -- it complained that there was a problem with mkinitrd --
 could not open it or find it -- guessing I need to rebuild the repro
 database.



 ... reboot, lather, rinse and repeat  :-) :=)

 You may find re-generating the CentOS CDs/DVD quite easy at
 times and very frustrating and complex at others ..

 Yeah, it appears so.  I just hope I don't have to rebuild all 6 CDs,
 since I don't have the DVD nor do I have a DVD-R drive either, so doing
 things with a single DVD is not an option.


Last time I added a DVD burner to my Build system, it cost $29 (USD) and
this was for a very good, reliable unit.  Not worth my time and immense
hassle to do otherwise 

As for ...

 could not open it or find it -- guessing I need to rebuild the repro
 database.

What about:

 First of all, if you replace an RPM, you'll need to do createrepro.

What isn't clear?? As need to means **MUST** h.
Why did you even make an attempt without the createrepro command??
Two to three minutes of your time wasn't worth it (but taking our time
is, of course) :-):-) ...

While I believe your bug reports, I install CentOS on RAID + LVM
everyday without any problems, private kernel patches, etc. -- as a
counter-example.

Now, let's see if we understand the situation clearly:

   1.  This is a one-time conversion (will throw the CD away when done) ...

   2.  You're migrating a system from Ubuntu to CentOS (e.g., a different
version / patch level of LVM + RAID) and *hope* to keep data
consistency and reliability ...

   3.  You have a backup of this precious data, yes / no??  If no, eegaddss

   4.  So, why not just do a wipe + fresh install and reload the data???

Now how much time have you spent on this project so far??  I believe
the above would be done 2 to 3x times over by now 

Plus, you've found out that it is a lot more than just a mkisofs command
with a few arguments (and that you have to follow instructions precisely
or things just don't WORK(tm)).

H

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Memory vs. Display Card

2009-03-08 Thread Richard Karhuse
On Sat, Mar 7, 2009 at 10:00 PM, Rick el...@spinics.net wrote:
 Since memory has become quite cheap lately I decided to move from 2 GB
 to 6. When I installed the memory every thing was fine until I went to
 run level 5. At that point the screen turned to garbage and the system
 froze. Is there a way to fix this so I can use the memory I bought? Do
 I need a new display card?

 Current hardware:

  Intel D975XBX2 Motherboard
  VGA compatible controller: ATI Technologies Inc RV505 [Radeon X1550 64-bit]


First of all, lots and lots of data missing here .

Secondly, I agree with other posters -- make sure that memtest86+ runs
successfully and finds all your memory.  Let it run *at least* overnight before
accepting the new memory.  [Note:  Three explicit things that you need to
check and report the results of here -- if you'd like more help.]

Third, check your BIOS settings -- particularly w.r.t. VGA memory, memory-hole
re-mapping, etc.  I'd do this before I'd run the memtests, btw.  Does the BIOS
see the memory?  Is the BIOS configured to map the VGA + PCI + ... (typically
up to 1 GB) memory to higher space?  Is your MTTR set to Discrete or
Continuous?  I'd run the Intel Linux Firmware BIOS test to see if the BIOS /
Memory are configured and compatible at this point.

Forth, what (precisely) CentOS kernel are you booting??  Does it support
greater than 4 GB of RAM??  Does it see all the memory -- both the 6 GB
of physical RAM plus the VGA + PCI re-mapped -- e.g., does it see almost
7 GB of memory??  How does the kernel see the memory (e.g., the MTTR
block -- which is one of the first things the system reports when it boots up)??

Fifth, after the GUI scrambles the screen, did you kill the session and/or
switch to an alternate Virtual Console and review both /var/log/messages
and X.org logfiles??

Once, you've got that, you might have a better idea of what's going on ...
(and maybe where your problem is ...)

HTH

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Creating an alternitive install CD for CentOS 5.2 (w/ patched mkinitrd)

2009-03-07 Thread Richard Karhuse
On Sat, Mar 7, 2009 at 10:08 AM, Robert Heller hel...@deepsoft.com wrote

 This seems overly complex for my needs.  I don't want (or need) to
 rebuild all 6 of the install CDs.  I just want to *replace* one RPM on
 the first CD.  I have copied the CD's directory tree to a writable file
 system and replaced the rpm in question.  I now need to just make a new
 ISO file and all I need is the proper command line arguments to mkisofs
 to do this.  I am *NOT* creating a new distribution.  And I really don't
 want to mess with a complex GUI program or edit many configuration
 files.

 I would also rather do this on my CentOS 4.7 system (revisor does not
 seem to be available for CentOS 4 / RHEL 4).  Running it on a diskless
 workstation with a read-only root file system is a total pain.  And will
 become even more painful when I then have to mount a large file system
 with NFS.


First of all, if you replace an RPM, you'll need to do createrepro.

disc_info=`head -1 $BASE/$ARCH/.discinfo`
createrepo -v --baseurl=$disc_info -g repodata/comps.xml $ARCH

If the RPM is a system RPM, then you probably want to do a
buildinstall first to get it into the anaconda system (and get a new
disc_info),
a la:

   $BASE/buildinstall --debug \
--version 5 --product 'CentOS' --release CentOS 5 \
--prodpath CentOS $BASE/$ARCH 21

If all you want is a mkisofs, what's wrong with the man command??

Maybe something like:

mkisofs -q -r -R -J -T -no-emul-boot -boot-load-size 4 -pad   \
-b isolinux/isolinux.bin -c isolinux/boot.cat -boot-info-table \
-V $VER ($date) \
-A $REL - $VER - $firmware  \
-publisher $PUB -p $PUB -x lost+found \
-o CentOS-$VER-$date.iso $ARCH21

... reboot, lather, rinse and repeat  :-) :=)

You may find re-generating the CentOS CDs/DVD quite easy at
times and very frustrating and complex at others ..

HTH

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] cpu load monitoring

2009-01-23 Thread Richard Karhuse
On Fri, Jan 23, 2009 at 1:04 PM, Brian Mathis brian.mat...@gmail.com wrote:

 On Fri, Jan 23, 2009 at 12:01 PM, Alex H. Vandenham a...@avantel.ca wrote:
  On Friday 23 January 2009 09:27:23 am Brian Mathis wrote:
  Another vote for sysstat/sar.  It has been around forever and this is
  it's purpose.  It also monitors all sorts of other parameters as well.
 
  Does anyone know of a useful guide to help me do the analysis of sysstat/sar
  reports?
 
  A.

 Start with the man page, it's loaded with stuff.  Make sure to check
 the See Also section.  The sysstat homepage is here:
 http://pagesperso-orange.fr/sebastien.godard/

Check-out ksar which does a good job for plotting SAR data.

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Update to Centos 5 anaconda kickstart %post bug?

2009-01-08 Thread Richard Karhuse
On Thu, Jan 8, 2009 at 6:14 PM, Warren, Eucke ewar...@wms.com wrote:

 I am restricted to 5.1 as approved by legal.  5.2 is not approved so 5.3
 isn't an option either.  Once I can sort out whether something
 official will fix this I can then determine how to pursue this
 internally.  A workaround fix does not address that the kickstart-built
 system will still contain this bug as it will be built from RPM's that
 are not fixed.

 Eucke

OK -- I might be missing something here ... my apologies, if so!

You're running your own kickstart file, yes/no?  And, running into
this issue.  Since you can control the ks.cfg, why not put into the
%pre section something that copies the section of the CD that
you need in the %post section to the RAM disk??  E.g.:

mkdir /tmp/source
CDR=/mnt/cdrom; [ ! -d $CDR ]  mkdir -p $CDR
DEV=/dev/$(sed -ne 's/.*trying to mount CD device //p' /tmp/anaconda.log)

if  [ -b $DEV ] ; then
:
  elif [ -b /dev/cdrom ] ; then
DEV=/dev/cdrom
  elif [ -b /dev/scd0 ]  ; then
DEV=/dev/scd0
  elif [ -b /dev/hdd ]   ; then
DEV=/dev/hdd
  elif [ -b /dev/hdc ]   ; then
DEV=/dev/hdc
  elif [ -b /dev/hdb ]   ; then
DEV=/dev/hdb
   elif [ -b /dev/hda ]   ; then
DEV=/dev/hda
else
DEV=/tmp/cdrom
fi

mount -r -t iso9660 $DEV $CDR  \
   cp -rp $CDR/.../tmp/source/

might give you some ideas ...

Remember ... you may be in a chroor'd env in the %post
section, so you may need to have a non-chroot'd %post
that copies the /tmp/source above to your built filesystems
(e.g., /mnt/sysimage/tmp).

I hope that helps ..


   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] how to debug hardware lockups?

2008-11-15 Thread Richard Karhuse
On Sat, Nov 15, 2008 at 3:16 AM, Rudi Ahlers [EMAIL PROTECTED] wrote:

 Hi,

 We have a server which locks up about once a week (for the past 3
 weeks now), without any warning, and the only way to recover it, is to
 reset the server. This causes unwanted downtime, and often software
 loss as well.

 How do I debug the server, which runs CentOS 5.2 to see why it locks
 up? The CPU is an Intel Q9300 Core 2 Quad, with 8 GB RAM, on an Intel
 Motherboard


Attach a local console to the video port and let us know what it says --
that will (probably) be very insightful.  E.G., Kernel panic, MCE, 

Next, run memtest86+ -- at least overnight.  [Note: I've had less than
stellar results with memtest86 recently, but if it shows errors, you've got
a problem big time; if it doesn't show errors, you still not 100% sure that
memory is good:-):-).]  Is it ECC memory??  If not, why not -- particularly
given it is a critical server 

Are all the fans spinning -- particularly the CPU??  Do you have lm-sensors
enabled??  Either create a script or using something like munin to track
things
and see if fans, temperature, voltages are all stable  within range up to
death.

Can you easilhy swap power supplies??  (Is the unit dual powered or just
one unit?)

Clearly, just a start, but you get the idea of elementary, 101 problem
solving 

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Telnet ssh connection limit and idle timeout

2008-09-25 Thread Richard Karhuse
On 9/24/08, lingu [EMAIL PROTECTED] wrote:

 *I am running centos 4 update 5. I want to limit user connection(maximum
 10 simultaneous connection are only allowed) to server
 (for telnet  ssh sessions).In the mean time i like to remove all dead and
 idle connections(ssh  telnet session) of more that 24 hours.*



Sorry that no one has help you yet on this.

Check-out limits.conf (e.g., man limits.conf).

This will allow you to limit the number of concurrent user
logins.

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] kickstart problems

2008-09-02 Thread Richard Karhuse
On 9/2/08, Paolo Supino [EMAIL PROTECTED] wrote:


 :

 Hi Joseph

After sending the last reply I fixed the kickstart config files and
 added --boot=yes to the network statement of eth0, but going through the
 consoles of each of the systems to see if the installation completed
 successfully I found a few that got stuck on the network interface
 configuration screen (where it asks for IPv4 and IPv6 static/dynamic
 configuration information: Configure TCP/IP).


 I've only been half reading this thread, so feel free to ignore this
interruption ...

Just plug one and only one NIC into the switch
Add ksdevice=link to your boot-up line (e.g. syslinux.cfg??).

Configure network (if you must) or just let DHCP take over.

Kickstart away 
(works for me on boxes where during anaconda installation the NICs are
labeled one way,
but CentOS running system does another).

Just a thought ...

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] S.M.A.R.T

2008-08-30 Thread Richard Karhuse
On Sat, Aug 30, 2008 at 4:08 AM, Mag Gam [EMAIL PROTECTED] wrote:

 At my physics lab we have 30 servers with 1TB disk packs. I am in need
 of monitoring for disk failures. I have been reading about SMART and
 it seems it can help. However, I am not sure what to look for if a
 drive is about to fail. Any thoughts about this? Is anyone using this
 method to predetermine disk failures?



Here are a few references from my archives w.r.t. SMART ...

Hope they help ...

   -rak-



http://hardware.slashdot.org/hardware/07/02/18/0420247.shtml
 Google Releases Paper on Disk Reliability*The Google engineers just
published a paper on Failure Trends in a Large Disk Drive
Populationhttp://labs.google.com/papers/disk_failures.pdf.
Based on a study of 100,000 disk drives over 5 years they find some
interesting stuff. To quote from the abstract: 'Our analysis identifies
several parameters from the drive's self monitoring facility (SMART) that
correlate highly with failures. Despite this high correlation, we conclude
that models based on SMART parameters alone are unlikely to be useful for
predicting individual drive failures. Surprisingly, we found that
temperature and activity levels were much less correlated with drive
failures than previously reported.'


*
http://hardware.slashdot.org/hardware/07/02/21/004233.shtml

 Everything You Know About Disks Is Wrong*Google's wasn't the best storage
paper at FAST '07 http://www.usenix.org/events/fast07/. Another, more
provocative paper looking at real-world results from 100,000 disk drives got
the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab,
submitted Disk failures in the real world: What does an MTTF of 1,000,000
hours mean to 
you?http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.htmlThe
paper crushes a number of (what we now know to be) myths about disks
such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive reliability
(spoiler: no difference), and RAID 5 assumptions. StorageMojo has a good
summary of the paper's key points http://storagemojo.com/?p=383.*


http://www.linuxjournal.com/article/6983?from=50comments_per_page=50

Monitoring Hard Disks with SMART By Bruce
Allenhttp://www.linuxjournal.com/user/801273on Thu, 2004-01-01
02:00.
SysAdmin http://www.linuxjournal.com/taxonomy/term/8 One of your hard
disks might be trying to tell you it's not long for this world. Install
software that lets you know when to replace it.

It's a given that all disks eventually die, and it's easy to see why. The
platters in a modern disk drive rotate more than a hundred times per second,
maintaining submicron tolerances between the disk heads and the magnetic
media that store data. Often they run 24/7 in dusty, overheated
environments, thrashing on heavily loaded or poorly managed machines. So,
it's not surprising that experienced users are all too familiar with the
symptoms of a dying disk. Strange things start happening. Inscrutable kernel
error messages cover the console and then the system becomes unstable and
locks up. Often, entire days are lost repeating recent work, re-installing
the OS and trying to recover data. Even if you have a recent backup, sudden
disk failure is a minor catastrophe.

http://smartmontools.sourceforge.net/

smartmontools Home Page

Welcome! This is the home page for the smartmontools package.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Kernel panic - not syncing: CPU context corrupt

2008-06-20 Thread Richard Karhuse
On 6/20/08, Alwin Roosen [EMAIL PROTECTED] wrote:

 Hi,


 CentOS release 5 (Final)
 Kernel 2.6.18-53.1.21.el5 on an i686

 ws174 login: CPU 1: Machine Check Exception: 0005
 CPU 0: Machine Check Exception: 0004
 Bank 3: f6220002010a at 32c93500
 Bank 5: f2300c000e0f
 Kernel panic - not syncing: CPU context corrupt
 Bank 3: f6220002010a



Alwin --

I would be very, very surprised *IF* this wasn't hardware
related.

Dave Jones wrote a nice little program to help decode this:

$ parsemce -b 3 -s f6220002010a -e 5 -a 32c93500
Status: (5) Machine Check in progress.
Restart IP valid.
parsebank(3): f6220002010a @ 32c93500
External tag parity error
CPU state corrupt. Restart not possible
Address in addr register valid
Error enabled in control register
Error not corrected.
Error overflow
Memory hierarchy error
Request: Generic error
Transaction type : Generic
Memory/IO : I/O

and:

$ parsemce -b 5 -s f2300c000e0f -e 4 -a 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(5): f2300c000e0f @ 0
External tag parity error
CPU state corrupt. Restart not possible
Error enabled in control register
Error not corrected.
Error overflow
Bus and interconnect error
Participation: Generic
Timeout: Request did not timeout
Request: Generic error
Transaction type : Invalid
Memory/IO : Other


Dag's Repo has the new memtest86+ 2.01 RPM.  I'd pull it and
let it run overnight.  While memtest86+ is good, I've recently had
cases where is didn't find (obvious) memory errors.

I've also seen things like SATA disks drive cause MCEs.

This one looks like you're taking memory parity errors somewhere
in the path to the CPU.  On you BIOS, check you Events log for
any interesting entries, too.

Hope this helps ...

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] New firewall, need mac changed

2008-04-20 Thread Richard Karhuse
On Sat, Apr 19, 2008 at 3:53 PM, Joseph L. Casale [EMAIL PROTECTED]
wrote:

 Modify /etc/sysconfig/network-scripts/ifcfg-ethX and remove the HWADDR
 line if you have one, and add a MACADDR with the mac address you want
 to use.
 
 Beware, some network cards may protest having the mac address changed,
 and using both HWADDR and MACADDR can cause issues. See
 /usr/share/doc/initscripts-*/sysconfig.txt for details.

 Jim,
 I appreciate the confirmation, that was the method I was going to use. I
 am only unsure about what *could* happen with the HWADDR in there, can
 eth{n} now maybe bind to a different nic under some circumstance?

 How can I always force the nic in question to use this script?

 Thank you!
 jlc
  CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos



Here is an outline of what I do to lock-down interfaces -- which relies
mainly on using a fairly new feature udev:

   /etc/modprobe.conf:  make sure the lines --

  alias eth? driver

   are in the correct order, e.g.:

  alias eth0 e1000
  alias eth1 e1000
  alias eth2 tg3

/etc/udev/rules.d/:  create network rules file (if needed) and
 add lines that associate a given NIC to its eth? interface.
 Use udevinfo -a -p /sys/class/net/eth? to get various
 features or attributes to find the NIC that you want to call
 ethX.  [Note: this seems to change from release to
 release, so this is a little general.]  You might want to put
 lines like:

   Kernel==eth?  ID==:03:02.0 Name=eth0
   Kernel==eth?  ID==:03:02.1 Name=eth1

  or

Kernel==eth?  Sys{vendor}==0x8086 Sys{device}==0x032a Name=eth0
Kernel==eth?  Sys{vendor}==0x8086 Sys{device}==0x1079 Name=eth1

/etc/sysconfig/network-scripts/ifcfg-ethX:

 As other have suggested, now put MACADDR= into these files with the
 desired MAC address that you want the interface to be set to and
 delete the HWADDR.

Now, reboot, test and repeat as needed:-):-) ...

I hope that helps and is useful ...

  -rak-

Note:  I just checked a Fedora 8 box and some of the above has
changed -- udev is the way to go, but be advised that this feature
appears to be evolving and changing -- hopefully for the better!
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] system gets suspended automatically!

2008-02-05 Thread Richard Karhuse
I'll let others work with you on the kernel version 
(which we'll assume is OK and a true CentOS install).

I would put up a console on the local KVM port to
capture the last set of messages before the system
hangs -- which might help isolate the problem.

From what we've seen so far, it sounds like you might
have a hardware problem.  The things that I would
check are:

  -  power supply (aka losing voltage)
  -  all the system fans (aka thermal shutdown)
  -  memory (run memtest86+ overnight [or longer])


If not that (and still a hardware problem), it is a lot more
subtle and will be fun to diagnose 

Hope this helps (a little) ...

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Help with file descriptors

2007-11-28 Thread Richard Karhuse
On Nov 28, 2007 3:55 PM, Guy Boisvert [EMAIL PROTECTED] wrote:

 Garrick Staples wrote:
  On Wed, Nov 28, 2007 at 03:03:30PM -0500, Guy Boisvert alleged:
  Hi all!
 
   I have a problem with CentOS 4.4 and Communigate Pro 5.0.9.  As
 our
  user number grows, we are seeing too many files open error messages
 in
  Communigate logs.
 
   I spoke with Communigate tech support and they asked me to
 increase
   the number of file descriptors which i did.  I put 128000 as a
 script i
  made to check Communigate open files reported as high as 99000.As i
  checked the Communigate log file, it reported that it sees 1024
  available file descriptors.
   :


When I saw this, ulimit -n immediately came to mind 
(and is usually 1024 -- the maximun # of files that any given
process can have open).  [See man ulimit.]

If this runs as non-root, it won't be able to take the limit
higher 

Hope this helps (and is not completely off-base)

   -rak-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos