On 10/24/2014 17:48, Alexander Pyhalov via illumos-discuss wrote:
Hello.
I was moving OI Hipster installation from physical server to KVM VM.


So I was continuing my attempts to move zfs pool to another host. Funny, but it already takes about a week.
I decided to move zfs filesystems one by one, finding out what's wrong.

While sending some snapshots I got
ZFS I/O error.
I hoped that errors are in files which are only in the old snapshots. As I don't need history too much, I destroyed all snapshots, created new one and tried to send. The same effect.
But this time zpool status finally noticed data errors.

# zpool  status -v  data
  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 4h5m with 0 errors on Tue Oct 28 13:58:26 2014
config:

        NAME                                     STATE     READ WRITE CKSUM
        data                                     ONLINE       0     0     7
          c4t6005076802808844B000000000000032d0  ONLINE       0     0    14

errors: Permanent errors have been detected in the following files:

        <0xb04>:<0x4832f>
        <0x1a35>:<0x4832f>
        <0x2045>:<0x4832f>
        <0x1596>:<0x4832f>
        <0x25fd>:<0x4832f>

I wanted to identify affected files.   So I tared the whole zone.
tar gave one I/O error:

# gtar -C /zones/build/root/ -cpf /export/test.tar  .
gtar: ./var/samba/locks/winbindd_privileged/pipe: socket ignored
gtar: ./var/postgres/9.3/data_64/base/16385/106833: Read error at byte 931653120, while reading 10240 bytes: I/O error
gtar: ./var/tmp/orbit-alp/linc-2197-0-5347df3ce3cb8: socket ignored
gtar: ./var/tmp/orbit-alp/linc-53fb-0-534682603ad6b: socket ignored

Dropped affected table from PostgreSQL database (luckily, it was just a test database), made vacuum full, so that file was removed.

After that could zfs send snapshot...

The question which worries me, why this could happen? We use IBM Storwize as backend. It shouldn't lie about writing data to the disk (at least we haven't found out such issues). On other hand, we have rather frequent power outages (about once per month) long enough for our UPSes to die...

--
Best regards,
Alexander Pyhalov,
system administrator of Southern Federal University IT department


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to