Re: support ranges TRIM for libata

2017-03-23 Thread Martin K. Petersen
Christoph Hellwig  writes:

> Meh, I remember why I gave up on it - to support queued trim passthrough
> we'd need to implement ATA 32 for the auxiliary fields and thus support
> 32-bit CDBs.  I don't really want to go there..

I wish we could stick with SATL. However, I have attempted this a few
times over the years and I am with Christoph here. Due to the
discrepancies between ACS and SBC it's a twisted mess. And the iterative
approach has not worked well for HBA SATLs either. Quite the contrary.

So I think sticking the DSM TRIM payload onto a SCSI command is the path
of least resistance. And then NAK'ing attempts to issue these commands
through sg.

I'll take closer look at the entire series tomorrow or Monday. I want to
test multiple ranges on my "Little Shop of SSD Horrors" when I get back
home from LSF/MM.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: support ranges TRIM for libata

2017-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2017 at 03:43:30PM +0100, Christoph Hellwig wrote:
> I tried this earlier before giving up on it because it looked to ugly.
> But I can complete that version of it and post it for people to compare.

Meh, I remember why I gave up on it - to support queued trim passthrough
we'd need to implement ATA 32 for the auxiliary fields and thus support
32-bit CDBs.  I don't really want to go there..


Re: support ranges TRIM for libata

2017-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2017 at 11:04:58AM -0400, Tejun Heo wrote:
> I kinda like the idea of sticking with satl as that's how libata has
> been doing most things even if the implementation is uglier.  It'd be
> great to find out whether the ugliness would be acceptable or too
> much.

The SATL way is simply broken.  I'll just leave the code corrupting
user data and 5 times slower than it could be then.


Re: support ranges TRIM for libata

2017-03-23 Thread Tejun Heo
Hello, Christoph.

On Thu, Mar 23, 2017 at 03:43:30PM +0100, Christoph Hellwig wrote:
> > That's up to you ... from the point of view of code documenting itself,
> > forming the ATA_16 TRIM in sd and not doing any satl transformation is
> > easier for others to follow, but if it's going to cause more code, I'm
> > only marginal on the advantages of easier to follow code.
> 
> I tried this earlier before giving up on it because it looked to ugly.
> But I can complete that version of it and post it for people to compare.

I kinda like the idea of sticking with satl as that's how libata has
been doing most things even if the implementation is uglier.  It'd be
great to find out whether the ugliness would be acceptable or too
much.

Thanks.

-- 
tejun


Re: support ranges TRIM for libata

2017-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2017 at 10:35:06AM -0400, James Bottomley wrote:
> I'm certainly not saying we blindly follow t10, but I believe their
> intent is to issue the next command from the completion of the first
> (we can do this using qc->complete_fn, like atapi_request_sense).  That
> way we don't get any tag problems because there's only one command
> outstanding at once; reusing the qc means no allocation issues either.
> 
> The t10 approach does mean the SG_IO problem is actually fixable rather
> than simply erroring out.

It would be sort of fixable, but with a lot of hackery.

> That's up to you ... from the point of view of code documenting itself,
> forming the ATA_16 TRIM in sd and not doing any satl transformation is
> easier for others to follow, but if it's going to cause more code, I'm
> only marginal on the advantages of easier to follow code.

I tried this earlier before giving up on it because it looked to ugly.
But I can complete that version of it and post it for people to compare.


Re: support ranges TRIM for libata

2017-03-23 Thread James Bottomley
On Thu, 2017-03-23 at 14:55 +0100, Christoph Hellwig wrote:
> On Thu, Mar 23, 2017 at 09:47:41AM -0400, James Bottomley wrote:
> > > The current implementation already has the issue of that it does
> > > corrupt user data reliably if the using SG_IO for WRITE SAME
> > > commands.
> > 
> > That does need fixing.
> 
> I don't think it's fixable as long as we translate the data payload.
> 
> > Why can't we do what the t10 sat document recommends: if the ATA 
> > device doesn't support the XL version (32 bit ranges) then 
> > translate unmap to multiple non-XL commands?
> 
> Because libata sits underneath the tag allocator we'd getting into
> a giant set of problems.  Where do you expect the new commands to
> magically come from?  And adhere to the queueing limits and not 
> actually deadlock in one way or another?

I'm certainly not saying we blindly follow t10, but I believe their
intent is to issue the next command from the completion of the first
(we can do this using qc->complete_fn, like atapi_request_sense).  That
way we don't get any tag problems because there's only one command
outstanding at once; reusing the qc means no allocation issues either.

The t10 approach does mean the SG_IO problem is actually fixable rather
than simply erroring out.

> > I don't necessarily object to the vendor specific 1<->1 approach, 
> > it's just it won't fix the problem you cited above (SG_IO WRITE 
> > SAME), its just that now we error the command, which may cause some
> > surprise.
> 
> We now error WRITE SAME for passthrough consistently.  Before we only
> accepted it only with the unmap bit set, and even did so incorrectly
> (not checking that the payload was all zeros).

If it never worked for anyone, I'm OK with changing it to error out.

> > I also wonder if we couldn't simply do an ATA_16 TRIM if we're
> > already going to all the trouble of recognising ATA devices in the 
> > sd discard path?
> 
> We probably could do that as well.  But that would drag a lot more
> ATA-specific code into sd.c than just formatting the ranges.

That's up to you ... from the point of view of code documenting itself,
forming the ATA_16 TRIM in sd and not doing any satl transformation is
easier for others to follow, but if it's going to cause more code, I'm
only marginal on the advantages of easier to follow code.

James



Re: support ranges TRIM for libata

2017-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2017 at 09:47:41AM -0400, James Bottomley wrote:
> > The current implementation already has the issue of that it does
> > corrupt user data reliably if the using SG_IO for WRITE SAME
> > commands.
> 
> That does need fixing.

I don't think it's fixable as long as we translate the data payload.

> Why can't we do what the t10 sat document recommends: if the ATA device
> doesn't support the XL version (32 bit ranges) then translate unmap to
> multiple non-XL commands?

Because libata sits underneath the tag allocator we'd getting into
a giant set of problems.  Where do you expect the new commands to
magically come from?  And adhere to the queueing limits and not actually
deadlock in one way or another?

> I don't necessarily object to the vendor specific 1<->1 approach, it's
> just it won't fix the problem you cited above (SG_IO WRITE SAME), its
> just that now we error the command, which may cause some surprise.

We now error WRITE SAME for passthrough consistently.  Before we only
accepted it only with the unmap bit set, and even did so incorrectly
(not checking that the payload was all zeros).

> I
> also wonder if we couldn't simply do an ATA_16 TRIM if we're already
> going to all the trouble of recognising ATA devices in the sd discard
> path?

We probably could do that as well.  But that would drag a lot more
ATA-specific code into sd.c than just formatting the ranges.


Re: support ranges TRIM for libata

2017-03-23 Thread James Bottomley
On Wed, 2017-03-22 at 19:19 +0100, Christoph Hellwig wrote:
> On Tue, Mar 21, 2017 at 02:59:01PM -0400, Tejun Heo wrote:
> > I do like the fact that this is a lot simpler than the previous
> > implementation but am not quite sure we want to deviate 
> > significantly from what we do for other commands (command 
> > translation).  Is it because fixing the existing implementation 
> > would involve invaisve changes including memory allocations?
> 
> The current implementation already has the issue of that it does
> corrupt user data reliably if the using SG_IO for WRITE SAME
> commands.

That does need fixing.

> Doing ranges using translation would turn into a nightmare because
> ATA TRIM ranges are 16 bits long while SCSI UNAMP ranges are 32-bit,
> so we effectively can't translated them without introducing a
> non-standard hook between libata and scsi to communicate that
> limit.

Why can't we do what the t10 sat document recommends: if the ATA device
doesn't support the XL version (32 bit ranges) then translate unmap to
multiple non-XL commands?

I don't necessarily object to the vendor specific 1<->1 approach, it's
just it won't fix the problem you cited above (SG_IO WRITE SAME), its
just that now we error the command, which may cause some surprise.  I
also wonder if we couldn't simply do an ATA_16 TRIM if we're already
going to all the trouble of recognising ATA devices in the sd discard
path?

James

>   And once we're down that path we might as well just do the
> right thing directly.



Re: support ranges TRIM for libata

2017-03-22 Thread Christoph Hellwig
On Tue, Mar 21, 2017 at 02:59:01PM -0400, Tejun Heo wrote:
> I do like the fact that this is a lot simpler than the previous
> implementation but am not quite sure we want to deviate significantly
> from what we do for other commands (command translation).  Is it
> because fixing the existing implementation would involve invaisve
> changes including memory allocations?

The current implementation already has the issue of that it does
corrupt user data reliably if the using SG_IO for WRITE SAME commands.

Doing ranges using translation would turn into a nightmare because
ATA TRIM ranges are 16 bits long while SCSI UNAMP ranges are 32-bit,
so we effectively can't translated them without introducing a
non-standard hook between libata and scsi to communicate that
limit.  And once we're down that path we might as well just do the
right thing directly.


Re: support ranges TRIM for libata

2017-03-21 Thread Tejun Heo
Hello,

On Mon, Mar 20, 2017 at 04:43:12PM -0400, Christoph Hellwig wrote:
> This series implements rangeѕ discard for ATA SSDs.  Compared to the
> initial NVMe support there are two things that complicate the ATA
> support:
> 
>  - ATA only suports 16-bit long ranges
>  - the whole mess of generating a SCSI command first and then
>translating it to an ATA one.
> 
> This series adds support for limited range size to the block layer,
> and stops translating discard commands - instead we add a new
> Vendor Specific SCSI command that contains the TRIM payload when
> the device asks for it.

I do like the fact that this is a lot simpler than the previous
implementation but am not quite sure we want to deviate significantly
from what we do for other commands (command translation).  Is it
because fixing the existing implementation would involve invaisve
changes including memory allocations?

Thanks.

-- 
tejun