It might be helpful to contact SSD vendor, report the issue and
inquire if half a year wearing out is expected behavior for this
model. Further, if you have an option to replace one (or both) SSDs
with fresh ones, this could tell for sure if they are the root cause.

Regards,
Andrey




On Mon, Dec 21, 2009 at 1:18 PM, Erik Trimble <erik.trim...@sun.com> wrote:
> Mart van Santen wrote:
>>
>> Hi,
>>
>> We have a X4150 with a J4400 attached. Configured with 2x32GB SSD's, in
>> mirror configuration (ZIL) and 12x 500GB SATA disks. We are running this
>> setup for over a half year now in production for NFS and iSCSI for a bunch
>> of virtual machines (currently about 100 VM's, Mostly Linux, some Windows)
>>
>> Since last week we have performance problems, cause IO Wait in the VM's.
>> Of course we did a big search in networking issue's, hanging machines,
>> filewall & traffic tests, but were unable to find any problems. So we had a
>> look into the zpool and dropped one of the mirrored SSD's from the pool (we
>> had some indication the ZIL was not working ok). No success. After adding
>> the disk, we  discovered the IO wait during the "resilvering" process was
>> OK, or at least much better, again. So last night we did the same handling,
>> dropped & added the same disk, and yes, again, the IO wait looked better.
>> This morning the same story.
>>
>> Because this machine is a production machine, we cannot tolerate to much
>> experiments. We now know this operation saves us for about 4 to 6 hours
>> (time to resilvering), but we didn't had the courage to detach/attach the
>> other SSD yet. We will try only a "resilver", without detach/attach, this
>> night, to see what happens.
>>
>> Can anybody explain how the detach/attach and resilver process works, and
>> especially if there is something different during the resilvering and the
>> handling of the SSD's/slog disks?
>>
>>
>> Regards,
>>
>>
>> Mart
>>
>>
>>
> Do the I/O problems go away when only one of the SSDs is attached?
>
>
> Frankly, I'm betting that your SSDs are wearing out.   Resilvering will
> essentially be one big streaming write, which is optimal for SSDs (even an
> SLC-based SSD, as you likely have, performs far better when writing large
> amounts of data at once).  NFS (and to a lesser extent iSCSI) is generally a
> whole lot of random small writes, which are hard on an SSD (especially
> MLC-based ones, but even SLC ones).   The resilvering process is likely
> turning many of the random writes coming in to the system into a large
> streaming write to the /resilvering/ drive.
>
> My guess is that the SSD you are having problems with has reached the end of
> it's useful lifespan, and the I/O problems you are seeing during normal
> operation are the result of that SSD's problems with committing data.
> There's no cure for this, other than replacing the SSD with a new one.
>
> SSDs are not hard drives. Even high-quality modern ones have /significantly/
> lower USE lifespans than an HD - that is, a heavily-used SSD will die well
> before a HD, but a very-lightly used SSD will likely outlast a HD.  And, in
> the case of SSDs, writes are far harder on the SSD than reads are.
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to