Thanx Alasdair, interesting ones.
About the /usr/sbin/quota in /etc/profile: why not change it to not run it if 
login is root or another
administrative user?
This way I could leave quota management, but leave administrative logins able 
to get bash
in these odd situations.
Gabriele.
----------------------------------------------------------------------------------
Da: Alasdair Lumsden
A: [email protected]
Data: 10 novembre 2012 14.37.20 CET
Oggetto: Re: [discuss] illumos based ZFS storage failure
I haven't read the whole thread, but the next time it happens you'll
want to invoke a panic and make the dump file available. You'll want to
ensure that:
1. Multithreaded dump is disabled in /etc/system with:
* Disable MT dump
set dump_plat_mincpu=0
Without this there is a risk of your dump not saving correctly.
2. That you have a dump device and that it's big enough to capture your
kernel size (zfs set volsize=X rpool/dump)
3. That dumpadm is happy and set to save cores etc:
dumpadm -y -z on -c kernel -d /dev/zvol/dsk/rpool/dump
There's lots of good info here:
http://wiki.illumos.org/display/illumos/How+To+Report+Problems
You can also inspect things with mdb while the system is up, but if it's
a production system normally you want to get it rebooted and into
production again ASAP. So in that situation, you can take a dump of the
running system with:
savecore -L
One thing to keep in mind is /etc/profile runs /usr/sbin/quota, which
can screw over logins when the zfs subsystem is unhappy. I really think
it should be removed by default since on most systems quotas aren't even
used. So comment it out - we do so on all our systems. This will give
you a better chance of logging in when things go wrong.
I think there's a way to SSH in bypassing /etc/profile but I can't
remember what it is - perhaps someone can chime in.
Good luck. Centralised storage is difficult to do and when it goes wrong
everything that depends on it goes down. It's a "all your eggs in one
giant failbasket". Doing it homebrew with ZFS is cost effective and can
be fast, but it is also risky. This is why there are companies like
Nexenta out there with certified combinations of hardware and software
engineered to work together. This extends to validating firmware
combinations of disks/HBAs/etc.
Cheers,
Alasdair
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175541-02f10c6f
Modify Your Subscription: 
https://www.listbox.com/member/?&id;secret=21175541-29e3e0ee
Powered by Listbox: http://www.listbox.com



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to