On 10/24/2014 17:48, Alexander Pyhalov via illumos-discuss wrote:
Hello.
I was moving OI Hipster installation from physical server to KVM VM.
So I was continuing my attempts to move zfs pool to another host. Funny,
but it already takes about a week.
I decided to move zfs filesystems one by one, finding out what's wrong.
While sending some snapshots I got
ZFS I/O error.
I hoped that errors are in files which are only in the old snapshots. As
I don't need history too much, I destroyed all snapshots, created new
one and tried to send. The same effect.
But this time zpool status finally noticed data errors.
# zpool status -v data
pool: data
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 4h5m with 0 errors on Tue Oct 28 13:58:26 2014
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 7
c4t6005076802808844B000000000000032d0 ONLINE 0 0 14
errors: Permanent errors have been detected in the following files:
<0xb04>:<0x4832f>
<0x1a35>:<0x4832f>
<0x2045>:<0x4832f>
<0x1596>:<0x4832f>
<0x25fd>:<0x4832f>
I wanted to identify affected files. So I tared the whole zone.
tar gave one I/O error:
# gtar -C /zones/build/root/ -cpf /export/test.tar .
gtar: ./var/samba/locks/winbindd_privileged/pipe: socket ignored
gtar: ./var/postgres/9.3/data_64/base/16385/106833: Read error at byte
931653120, while reading 10240 bytes: I/O error
gtar: ./var/tmp/orbit-alp/linc-2197-0-5347df3ce3cb8: socket ignored
gtar: ./var/tmp/orbit-alp/linc-53fb-0-534682603ad6b: socket ignored
Dropped affected table from PostgreSQL database (luckily, it was just a
test database), made vacuum full, so that file was removed.
After that could zfs send snapshot...
The question which worries me, why this could happen? We use IBM
Storwize as backend. It shouldn't lie about writing data to the disk (at
least we haven't found out such issues). On other hand, we have rather
frequent power outages (about once per month) long enough for our UPSes
to die...
--
Best regards,
Alexander Pyhalov,
system administrator of Southern Federal University IT department
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com