On 17/05/2016 08:49, Borja Marcos wrote:
On 05 May 2016, at 16:39, Warner Losh <i...@bsdimp.com> wrote:

What do you think? In some cases it’s clear that TRIM can do more harm than 
good.
I think it’s best we not overreact.
I agree. But with this issue the system is almost unusable for now.

This particular case is cause by the nvd driver, not the Intel P3500 NVME 
drive. You need
a solution (3): Fix the driver.

Specifically, ZFS is pushing down a boatload of BIO_DELETE requests. In ata/da 
land, these
requests are queued up, then collapsed together as much as makes sense (or is 
possible).
This vastly helps performance (even with the extra sorting that I forced to be 
in there that I
need to fix before 11). The nvd driver needs to do the same thing.
I understand that, but I don’t think it’s a good that ZFS depends blindly on a 
driver feature such
as that. Of course, it’s great to exploit it.

I have also noticed that ZFS has a good throttling mechanism for write 
operations. A similar
mechanism should throttle trim requests so that trim requests don’t clog the 
whole system.
It already does.

I’d be extremely hesitant to tossing away TRIMs. They are actually quite 
important for
the FTL in the drive’s firmware to proper manage the NAND wear. More free space 
always
reduces write amplification. It tends to go as 1 / freespace, so simply 
dropping them on
the floor should be done with great reluctance.
I understand. I was wondering about choosing the lesser between two evils. A 15 
minute
I/O stall (I deleted 2 TB of data, that’s a lot, but not so unrealistic) or 
settings trims aside
during the peak activity.

I see that I was wrong on that, as a throttling mechanism would be more than 
enough probably,
unless the system is close to running out of space.

I’ve filed a bug report anyway. And copying to -stable.


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209571

TBH it sounds like you may have badly behaved HW, we've used ZFS + TRIM and for years on large production boxes and while we're seen slow down we haven't experienced the total lockups you're describing.

The graphs on you're ticket seem to indicate peak throughput of 250MB/s which is extremely slow for standard SSD's let alone NVMe ones and when you add in the fact you have 10 well it seems like something is VERY wrong.

I just did a quick test on our DB box here creating and then deleting a 2G file as you describe and I couldn't even spot the delete in the general noise it was so quick to process and that's a 6 disk machine with P3700's.

    Regards
    Steve


_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to