-----Original Message-----
From: Alexander Pyhalov via illumos-discuss [mailto:[email protected]] 
Sent: Friday, October 31, 2014 2:46 PM
To: [email protected]
Subject: Re: [discuss] assertion failed: space_map_open

On 10/24/2014 17:48, Alexander Pyhalov via illumos-discuss wrote:
> Hello.
> I was moving OI Hipster installation from physical server to KVM VM.
>

[[ci]] I've had something similar a couple of weeks ago trying to send/recv a 
ZFS between two OmniOS hosts. It turned out the source host did not have ECC 
memory and a stick was bad (memtst86+ gave me some 128000 single bit 
errors...). The system does not notice it if no ECC, so the pool gets easily 
clobbered under load. During the zfs send/recv operation it was hitting some 
bad spots; there was nothing wrong with the disks - they were earlier qualified 
using both the manufacturer's diagnostics and a full analyse/verify under 
Solaris. 

I have since replaced the memory with proper ECC one and everything is spick 
and span. 


Chavdar Ivanov 
--------------------

So I was continuing my attempts to move zfs pool to another host. Funny, but it 
already takes about a week.
I decided to move zfs filesystems one by one, finding out what's wrong.

While sending some snapshots I got
ZFS I/O error.
I hoped that errors are in files which are only in the old snapshots. As I 
don't need history too much, I destroyed all snapshots, created new one and 
tried to send. The same effect.
But this time zpool status finally noticed data errors.

# zpool  status -v  data
   pool: data
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://illumos.org/msg/ZFS-8000-8A
   scan: scrub repaired 0 in 4h5m with 0 errors on Tue Oct 28 13:58:26 2014
config:

         NAME                                     STATE     READ WRITE CKSUM
         data                                     ONLINE       0     0     7
           c4t6005076802808844B000000000000032d0  ONLINE       0     0    14

errors: Permanent errors have been detected in the following files:

         <0xb04>:<0x4832f>
         <0x1a35>:<0x4832f>
         <0x2045>:<0x4832f>
         <0x1596>:<0x4832f>
         <0x25fd>:<0x4832f>

I wanted to identify affected files.   So I tared the whole zone.
tar gave one I/O error:

# gtar -C /zones/build/root/ -cpf /export/test.tar  .
gtar: ./var/samba/locks/winbindd_privileged/pipe: socket ignored
gtar: ./var/postgres/9.3/data_64/base/16385/106833: Read error at byte 
931653120, while reading 10240 bytes: I/O error
gtar: ./var/tmp/orbit-alp/linc-2197-0-5347df3ce3cb8: socket ignored
gtar: ./var/tmp/orbit-alp/linc-53fb-0-534682603ad6b: socket ignored

Dropped affected table from PostgreSQL database (luckily, it was just a test 
database), made vacuum full, so that file was removed.

After that could zfs send snapshot...

The question which worries me, why this could happen? We use IBM Storwize as 
backend. It shouldn't lie about writing data to the disk (at least we haven't 
found out such issues). On other hand, we have rather frequent power outages 
(about once per month) long enough for our UPSes to die...

-- 
Best regards,
Alexander Pyhalov,
system administrator of Southern Federal University IT department


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175473-bb25bf80
Modify Your Subscription: https://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com


--------------------------------------------------------------------------------------------------
This email and any attachments are confidential and are for the use of the 
addressee only. If you are not the addressee, you must not use or disclose the 
contents to any other person. Please immediately notify the sender and delete 
the email. Statements and opinions expressed here may not represent those of 
the company. Email correspondence is monitored by the company. This information 
may be subject to Export Control Regulation. You are obliged to comply with 
such Regulations. The parent company of the Delcam Group is Delcam Ltd, 
registered in England no. 2311487. Registered Office: Talbot Way, Small Heath 
Business Park, Birmingham B10 0HJ, United Kingdom. Tel +44 (0) 121 7665544
--------------------------------------------------------------------------------------------------



Visit Delcam at JIMTOF, SEMA, METALEX and EUROMOLD - For more details visit <a 
href="http://www.delcam.com/shows2014";> http://www.delcam.com/shows2014>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to