[OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone

2014-03-21 Thread Matthew Mabis
Hey All, 

I am debating the idea of just swapping all my hard drives in my current 8x2TB 
RaidZ2 (all be it slowly) and let the environment resilver each drive than 
expand versus creating a new RaidZ2 on a different box and cloning the data 
over. 

Obviously i know of the Pros/Cons/Risks associated with that method. My 
question about debating deals with the new drives being 4K where as the old 
drives were 512b aligned My Current config is using (6x Hitachi HDS5C302 and 2x 
SAMSUNG HD203WI) where i will be switching over to ST4000VN000 drives all the 
way (purchased 4 already waiting a little time to see if i can purchase via a 
different batch [some ppl debate on this but to me its the way i have done it 
for a long time]) i don't wan't to us dissimilar models anymore as sometimes 
the samsung drives in this config went well lets call it NUTTY 

I use my environment for multiple things (Network Data Backups, NFS Backups for 
ESXi, Media Storage) my current environment is running down on space and with 
my projections ill run out of space within the next 6 months (~26% Free that 
includes the 1.08TB Reservation) so i am prepping for the transition. 

Just curious what you would do in my situation, replace the drives or build a 
new vDev and why? 

I have all the underlying hardware to handle it (SAS-2008 Controller, ECC, and 
a ZIL/SLOG. If needed i could use my infiniband backend to clone the data at 
10Gb via IPoIB) 

Thanks 
Matt 
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone

2014-03-21 Thread Chris Siebenmann
| I am debating the idea of just swapping all my hard drives in my
| current 8x2TB RaidZ2 (all be it slowly) and let the environment
| resilver each drive than expand versus creating a new RaidZ2 on a
| different box and cloning the data over.
|
| Obviously i know of the Pros/Cons/Risks associated with that
| method. My question about debating deals with the new drives being 4K
| where as the old drives were 512b aligned [...]

 As far as I know there is no question here: you simply cannot put 4K
drives in a vdev originally created with 512b drives[*]. You need to
make a new pool with the 4K drives.

 Even if you could get them into the existing pool, the performance
effects would likely be relatively bad. ZFS does a lot of unaligned
writes.

- cks
[*: If we're being technical, it's possible to force OmniOS to think
that they're all 512b drives.
]
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone

2014-03-21 Thread Matthew Mabis
I know the drive itself does 512b emulation but i would rather run 4K if theres 
a performance increase!

thanks

Matt 





- Original Message -
From: Chris Siebenmann c...@cs.toronto.edu
To: Matthew Mabis mma...@vmware.com
Cc: omnios-discuss@lists.omniti.com
Sent: Friday, March 21, 2014 9:04:37 AM
Subject: Re: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 
or Create new Vol and clone

| I am debating the idea of just swapping all my hard drives in my
| current 8x2TB RaidZ2 (all be it slowly) and let the environment
| resilver each drive than expand versus creating a new RaidZ2 on a
| different box and cloning the data over.
|
| Obviously i know of the Pros/Cons/Risks associated with that
| method. My question about debating deals with the new drives being 4K
| where as the old drives were 512b aligned [...]

 As far as I know there is no question here: you simply cannot put 4K
drives in a vdev originally created with 512b drives[*]. You need to
make a new pool with the 4K drives.

 Even if you could get them into the existing pool, the performance
effects would likely be relatively bad. ZFS does a lot of unaligned
writes.

- cks
[*: If we're being technical, it's possible to force OmniOS to think
that they're all 512b drives.
]
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] zpool degraded while smart sais disks are OK

2014-03-21 Thread Tobias Oetiker
a zpool on one of our boxes has been degraded with several disks
faulted ...

* the disks are all sas direct attached
* according to smartctl the offending disks have no faults.
* zfs decided to fault the disks after the events below.

I have now told the pool to clear the errors and it is resilvering the disks 
... (in progress)

any idea what is happening here ?

Mar  2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
Mar  2 22:21:51 foo mptsas_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3117
Mar  2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
Mar  2 22:21:51 foo mptsas_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3117
Mar  2 22:21:51 foo scsi: [ID 365881 kern.info] 
/pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
Mar  2 22:21:51 foo Log info 0x3117 received for target 11.
Mar  2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Mar  2 22:21:51 foo scsi: [ID 365881 kern.info] 
/pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
Mar  2 22:21:51 foo Log info 0x3117 received for target 11.
Mar  2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc


Mar  5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
Mar  5 02:20:53 foo mptsas_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3117
Mar  5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
Mar  5 02:20:53 foo mptsas_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3117
Mar  5 02:20:53 foo scsi: [ID 365881 kern.info] 
/pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
Mar  5 02:20:53 foo Log info 0x3117 received for target 10.
Mar  5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Mar  5 02:20:53 foo scsi: [ID 365881 kern.info] 
/pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
Mar  5 02:20:53 foo Log info 0x3117 received for target 10.
Mar  5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch t...@oetiker.ch +41 62 775 9902
*** We are hiring IT staff: www.oetiker.ch/jobs ***
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Debating Swapping 2TB with 4TB drives in RaidZ2 or Create new Vol and clone

2014-03-21 Thread Chris Siebenmann
| I know the drive itself does 512b emulation but i would rather run 4K
| if theres a performance increase!

 What matters for OmniOS is what the drive reports as. If it reports
honestly that it has a 4k physical sector size, ZFS will say 'nope!'
even if the drive will accept 512b reads and writes.

 This is a very unfortunate limitation these days since it's increasingly
hard to get drives that do not have 4k physical sector drives. But that's
life.

- cks
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zpool degraded while smart sais disks are OK

2014-03-21 Thread Richard Elling

On Mar 21, 2014, at 9:48 AM, Tobias Oetiker t...@oetiker.ch wrote:

 a zpool on one of our boxes has been degraded with several disks
 faulted ...
 
 * the disks are all sas direct attached
 * according to smartctl the offending disks have no faults.
 * zfs decided to fault the disks after the events below.
 
 I have now told the pool to clear the errors and it is resilvering the disks 
 ... (in progress)
 
 any idea what is happening here ?
 
 Mar  2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: 
 /pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
 Mar  2 22:21:51 foo mptsas_handle_event_sync: IOCStatus=0x8000, 
 IOCLogInfo=0x3117
 Mar  2 22:21:51 foo scsi: [ID 243001 kern.warning] WARNING: 
 /pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
 Mar  2 22:21:51 foo mptsas_handle_event: IOCStatus=0x8000, 
 IOCLogInfo=0x3117
 Mar  2 22:21:51 foo scsi: [ID 365881 kern.info] 
 /pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
 Mar  2 22:21:51 foo Log info 0x3117 received for target 11.
 Mar  2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
 Mar  2 22:21:51 foo scsi: [ID 365881 kern.info] 
 /pci@0,0/pci8086,3c04@2/pci1000,3020@0 (mpt_sas0):
 Mar  2 22:21:51 foo Log info 0x3117 received for target 11.
 Mar  2 22:21:51 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

These are command aborted reports from the target device. You will see these 
every 60 seconds if the disk
is not responding and the subsequent reset of the disk aborts the commands that 
are not responding.
 -- richard

 
 
 Mar  5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: 
 /pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
 Mar  5 02:20:53 foo mptsas_handle_event_sync: IOCStatus=0x8000, 
 IOCLogInfo=0x3117
 Mar  5 02:20:53 foo scsi: [ID 243001 kern.warning] WARNING: 
 /pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
 Mar  5 02:20:53 foo mptsas_handle_event: IOCStatus=0x8000, 
 IOCLogInfo=0x3117
 Mar  5 02:20:53 foo scsi: [ID 365881 kern.info] 
 /pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
 Mar  5 02:20:53 foo Log info 0x3117 received for target 10.
 Mar  5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
 Mar  5 02:20:53 foo scsi: [ID 365881 kern.info] 
 /pci@0,0/pci8086,3c06@2,2/pci1000,3020@0 (mpt_sas1):
 Mar  5 02:20:53 foo Log info 0x3117 received for target 10.
 Mar  5 02:20:53 foo scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
 
 -- 
 Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
 www.oetiker.ch t...@oetiker.ch +41 62 775 9902
 *** We are hiring IT staff: www.oetiker.ch/jobs ***
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zpool degraded while smart sais disks are OK

2014-03-21 Thread Tobias Oetiker
Today Zach Malone wrote:

 On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling
 richard.ell...@richardelling.com wrote:
 
  On Mar 21, 2014, at 9:48 AM, Tobias Oetiker t...@oetiker.ch wrote:
 
  a zpool on one of our boxes has been degraded with several disks
  faulted ...
 
  * the disks are all sas direct attached
  * according to smartctl the offending disks have no faults.
  * zfs decided to fault the disks after the events below.
 
  I have now told the pool to clear the errors and it is resilvering the disks
  ... (in progress)
 
  any idea what is happening here ?

 ...

 Did all the disks fault at the same time, or was it spread out over a
 longer period?  I'd suspect your power supply or disk controller.
 What are your zpool errors?

it happened over time as you can see from the timestamps in the
log. The errors from zfs's point of view were 1 read and about 30 write

but according to smart the disks are without flaw

cheers
tobi



-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch t...@oetiker.ch +41 62 775 9902
*** We are hiring IT staff: www.oetiker.ch/jobs ***
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zpool degraded while smart sais disks are OK

2014-03-21 Thread Richard Elling

On Mar 21, 2014, at 3:23 PM, Tobias Oetiker t...@oetiker.ch wrote:

 Today Zach Malone wrote:
 
 On Fri, Mar 21, 2014 at 3:50 PM, Richard Elling
 richard.ell...@richardelling.com wrote:
 
 On Mar 21, 2014, at 9:48 AM, Tobias Oetiker t...@oetiker.ch wrote:
 
 a zpool on one of our boxes has been degraded with several disks
 faulted ...
 
 * the disks are all sas direct attached
 * according to smartctl the offending disks have no faults.
 * zfs decided to fault the disks after the events below.
 
 I have now told the pool to clear the errors and it is resilvering the disks
 ... (in progress)
 
 any idea what is happening here ?
 
 ...
 
 Did all the disks fault at the same time, or was it spread out over a
 longer period?  I'd suspect your power supply or disk controller.
 What are your zpool errors?
 
 it happened over time as you can see from the timestamps in the
 log. The errors from zfs's point of view were 1 read and about 30 write
 
 but according to smart the disks are without flaw

Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable
errors that are related to media or heads. For a clue to more permanent errors,
you will want to look at the read/write error reports for errors that are 
corrected with possible delays. You can also look at the grown defects list.

This behaviour is expected for drives with errors that are not being quickly 
corrected or have firmware bugs (horrors!) and where the disk does not do TLER
(or its vendor's equivalent)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zpool degraded while smart sais disks are OK

2014-03-21 Thread Tobias Oetiker
Yesterday Richard Elling wrote:


 On Mar 21, 2014, at 3:23 PM, Tobias Oetiker t...@oetiker.ch wrote:

[...]
 
  it happened over time as you can see from the timestamps in the
  log. The errors from zfs's point of view were 1 read and about 30 write
 
  but according to smart the disks are without flaw

 Actually, SMART is pretty dumb. In most cases, it only looks for uncorrectable
 errors that are related to media or heads. For a clue to more permanent 
 errors,
 you will want to look at the read/write error reports for errors that are
 corrected with possible delays. You can also look at the grown defects list.

 This behaviour is expected for drives with errors that are not being quickly
 corrected or have firmware bugs (horrors!) and where the disk does not do TLER
 (or its vendor's equivalent)
  -- richard

the error counters look like this:


Error counter log:
   Errors Corrected by   Total   Correction Gigabytes
Total
   ECC  rereads/errors   algorithm  processed
uncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  
errors
read:   34940 0  3494  44904530.879 
  0
write: 00 0 0  39111   1793.323 
  0
verify:00 0 0   8133  0.000 
  0

the disk vendor is HGST in case anyone has further ideas ... the system has 20 
of these disks and the problems occured with
three of them. The system has been running fine for two months previously.

Vendor:   HGST
Product:  HUS724030ALS640
Revision: A152
User Capacity:3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Serial number:P8J20SNV
Device type:  disk
Transport protocol:   SAS

cheers
tobi



-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
www.oetiker.ch t...@oetiker.ch +41 62 775 9902
*** We are hiring IT staff: www.oetiker.ch/jobs ***
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss