from:"David Laight"

RE: [PATCH] bfa: remove VLA

2018-03-15 Thread David Laight

> > -   sizeof(wwn_t[iocmd->nports])) != BFA_STATUS_OK) {
> > +   sizeof(wwn_t) * iocmd->nports) != BFA_STATUS_OK) {
> 
> These parentheses made me blurry eyed but it's actually OK.

iocmd->nports * sizeof(wwn_t)) != BFA_STATUS_OK) {

is easier to focus on :-)

David

RE: [PATCH] scsi: resolve COMMAND_SIZE at compile time

2018-03-13 Thread David Laight

From: Stephen Kitt
> Sent: 09 March 2018 22:34
> 
> COMMAND_SIZE currently uses an array of values in block/scsi_ioctl.c.
> A number of device_handler functions use this to initialise arrays,
> and this is flagged by -Wvla.
> 
> This patch replaces COMMAND_SIZE with a variant using a formula which
> can be resolved at compile time in cases where the opcode is fixed,
> resolving the array size and avoiding the VLA. The downside is that
> the code is less maintainable and that some call sites end up having
> more complex generated code.
> 
> Since scsi_command_size_tbl is dropped, we can remove the dependency
> on BLK_SCSI_REQUEST from drivers/target/Kconfig.
> 
> This was prompted by https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Stephen Kitt 
> ---
>  block/scsi_ioctl.c |  8 
>  drivers/target/Kconfig |  1 -
>  include/scsi/scsi_common.h | 13 +++--
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index 60b471f8621b..b9e176e537be 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -41,14 +41,6 @@ struct blk_cmd_filter {
> 
>  static struct blk_cmd_filter blk_default_cmd_filter;
> 
> -/* Command group 3 is reserved and should never be used.  */
> -const unsigned char scsi_command_size_tbl[8] =
> -{
> - 6, 10, 10, 12,
> - 16, 12, 10, 10
> -};
> -EXPORT_SYMBOL(scsi_command_size_tbl);
> -
>  #include 
> 
>  static int sg_get_version(int __user *p)
> diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
> index 4c44d7bed01a..b5f05b60cf06 100644
> --- a/drivers/target/Kconfig
> +++ b/drivers/target/Kconfig
> @@ -4,7 +4,6 @@ menuconfig TARGET_CORE
>   depends on SCSI && BLOCK
>   select CONFIGFS_FS
>   select CRC_T10DIF
> - select BLK_SCSI_REQUEST # only for scsi_command_size_tbl..
>   select SGL_ALLOC
>   default n
>   help
> diff --git a/include/scsi/scsi_common.h b/include/scsi/scsi_common.h
> index 731ac09ed231..48d950666376 100644
> --- a/include/scsi/scsi_common.h
> +++ b/include/scsi/scsi_common.h
> @@ -15,8 +15,17 @@ scsi_varlen_cdb_length(const void *hdr)
>   return ((struct scsi_varlen_cdb_hdr *)hdr)->additional_cdb_length + 8;
>  }
> 
> -extern const unsigned char scsi_command_size_tbl[8];
> -#define COMMAND_SIZE(opcode) scsi_command_size_tbl[((opcode) >> 5) & 7]
> +/*
> + * SCSI command sizes are as follows, in bytes, for fixed size commands, per
> + * group: 6, 10, 10, 12, 16, 12, 10, 10. The top three bits of an opcode
> + * determine its group.
> + * The size table is encoded into a 32-bit value by subtracting each value
> + * from 16, resulting in a value of 1715488362
> + * (6 << 28 + 6 << 24 + 4 << 20 + 0 << 16 + 4 << 12 + 6 << 8 + 6 << 4 + 10).
> + * Command group 3 is reserved and should never be used.
> + */
> +#define COMMAND_SIZE(opcode) \
> + (16 - (15 & (1715488362 >> (4 * (((opcode) >> 5) & 7)

Why not:
(6 + (15 & (0x446a6440u 

Specifying the constant in hex makes it more obvious.
It is a shame about the 16, but an offset is easier than the negate.

David

RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq

2018-02-23 Thread David Laight

From: Andy Shevchenko
> Sent: 23 February 2018 17:13
> To: David Laight
> Cc: Arnd Bergmann; James Smart; Dick Kennedy; James E.J. Bottomley; Martin K. 
> Petersen; Hannes
> Reinecke; Johannes Thumshirn; linux-scsi@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq
> 
> On Fri, Feb 23, 2018 at 7:09 PM, David Laight <david.lai...@aculab.com> wrote:
> > From: Andy Shevchenko
> >> Sent: 23 February 2018 16:51
> >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <david.lai...@aculab.com> 
> >> wrote:
> 
> 
> >> The side-effect I referred previously is about tails, i.e. unaligned
> >> bytes are transferred in portions
> >> like
> >>   7 on 64-bit will be  4 + 2 + 1,
> >>   5 = 4 + 1
> >
> > on 64bit memcpy() is allowed to do:
> > (long *)(tgt+len)[-1] = (long *)(src+len)[-1];
> > rep_movsq(tgt, src, len >> 3);
> > provided the length is at least 8.
> >
> > The misaligned PCIe transfer generates a single TLP covering 12 bytes with 
> > the
> > relevant byte enables set for the first and last 32bit words.
> 
> But is it guaranteed on any type of bus?
> memcpy_toio() is a generic helper, so, first of all we need to be sure
> what CPU on its side does, this is I think is pretty straight forward
> since it's all written in asm for 64-bit case.

I've just done a compile test, on x86-64 memcpy_toio() generates a
call to memcpy() (checked with objdump -dr).
That is on a system running a 4.14 kernel, so is probably using the system
headers from that release.
I'd need to do a run-time test on a newer system verify what the PCIe slave
sees - but I changed our driver to do its own copy loops in order to avoid
byte transfers some time ago.

FWIW I was originally doing copy_to/from_user() directly to PCIe memory 
addresses!

On x86 'memory' on devices can always be accesses by simple instructions.
Hardware 'IO' addresses are not valid for memcpy_to/fromio().

David

RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq

2018-02-23 Thread David Laight

From: Andy Shevchenko
> Sent: 23 February 2018 16:51
> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <david.lai...@aculab.com> wrote:
> > From: Arnd Bergmann
> >> Sent: 23 February 2018 15:37
> >>
> >> 32-bit architectures generally cannot use writeq(), so we now get a build
> >> failure for the lpfc driver:
> >>
> >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of 
> >> function 'writeq'; did you mean
> >> 'writeb'? [-Werror=implicit-function-declaration]
> >>
> >> Another problem here is that writing out actual data (unlike accessing
> >> mmio registers) means we must write the data with the same endianess
> >> that we have read from memory, but writeq() will perform byte swaps
> >> and add barriers inbetween accesses as we do for registers.
> >>
> >> Using memcpy_toio() should do the right thing here, using register
> >> sized stores with correct endianess conversion and barriers (i.e. none),
> >> but on some architectures might fall back to byte-size access.
> > ...
> >
> > Have you looked at the performance impact of this on x86?
> > Last time I looked memcpy_toio() aliased directly to memcpy().
> > memcpy() is run-time patched between several different algorithms.
> > On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
> > on the hardware to DTRT.
> > For uncached accesses (typical for io) the 'RT' has to be byte transfers.
> > So instead of the 8 byte transfers (on 64 bit) you get single bytes.
> > This won't be what is intended!
> > memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.
> 
> Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit.

(Let's not argue about the instruction mnemonic). 

You might expect that, but last time I looked at the bus cycles on a PCIe slave
that wasn't what I saw.

> The side-effect I referred previously is about tails, i.e. unaligned
> bytes are transferred in portions
> like
>   7 on 64-bit will be  4 + 2 + 1,
>   5 = 4 + 1

on 64bit memcpy() is allowed to do:
(long *)(tgt+len)[-1] = (long *)(src+len)[-1];
rep_movsq(tgt, src, len >> 3);
provided the length is at least 8.

The misaligned PCIe transfer generates a single TLP covering 12 bytes with the
relevant byte enables set for the first and last 32bit words.

David

RE: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq

2018-02-23 Thread David Laight

From: Arnd Bergmann
> Sent: 23 February 2018 15:37
> 
> 32-bit architectures generally cannot use writeq(), so we now get a build
> failure for the lpfc driver:
> 
> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put':
> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 
> 'writeq'; did you mean
> 'writeb'? [-Werror=implicit-function-declaration]
> 
> Another problem here is that writing out actual data (unlike accessing
> mmio registers) means we must write the data with the same endianess
> that we have read from memory, but writeq() will perform byte swaps
> and add barriers inbetween accesses as we do for registers.
> 
> Using memcpy_toio() should do the right thing here, using register
> sized stores with correct endianess conversion and barriers (i.e. none),
> but on some architectures might fall back to byte-size access.
...

Have you looked at the performance impact of this on x86?
Last time I looked memcpy_toio() aliased directly to memcpy().
memcpy() is run-time patched between several different algorithms.
On recent Intel cpus memcpy() is implemented as 'rep movsb' relying
on the hardware to DTRT.
For uncached accesses (typical for io) the 'RT' has to be byte transfers.
So instead of the 8 byte transfers (on 64 bit) you get single bytes.
This won't be what is intended!
memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer.

David

RE: [PATCH 04/22] scsi: fusion: fix string overflow warning

2017-07-17 Thread David Laight

From: Arnd Bergmann
> Sent: 14 July 2017 13:07
> gcc points out a theorerical string overflow:
> 
> drivers/message/fusion/mptbase.c: In function 'mpt_detach':
> drivers/message/fusion/mptbase.c:2103:17: error: '%s' directive writing up to 
> 31 bytes into a region
> of size 28 [-Werror=format-overflow=]
> sprintf(pname, MPT_PROCFS_MPTBASEDIR "/%s/summary", ioc->name);
>^
> drivers/message/fusion/mptbase.c:2103:2: note: 'sprintf' output between 13 
> and 44 bytes into a
> destination of size 32
> 
> We can simply double the size of the local buffer here to be on the
> safe side.

I think I'd change it to snprintf() as well.
Saves any worries if ioc->name isn't '\0' terminated.

David

RE: remove dma_alloc_noncoherent

2017-06-20 Thread David Laight

From: Christoph Hellwig
> Sent: 16 June 2017 08:17
>
> For many years we've had the dma_alloc_attrs API that is more flexible
> than dma_alloc_noncoherent.  This series moves the remaining users over
> to the attrs API.

And most of the callers probably only want to specify 'noncoherent'.
Grepping the source for other uses is easier if the wrapper is left.

David

RE: [PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-18 Thread David Laight

From: Logan Gunthorpe
> Sent: 13 April 2017 23:05
> Straightforward conversion to the new helper, except due to
> the lack of error path, we have to warn if unmapable memory
> is ever present in the sgl.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/block/xen-blkfront.c | 33 +++--
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 5067a0a..7dcf41d 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>   BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> 
>   if (setup.need_copy) {
> - setup.bvec_off = sg->offset;
> - setup.bvec_data = kmap_atomic(sg_page(sg));
> + setup.bvec_off = 0;
> + setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
> + if (IS_ERR(setup.bvec_data)) {
> + /*
> +  * This should really never happen unless
> +  * the code is changed to use memory that is
> +  * not mappable in the sg. Seeing there is a
> +  * questionable error path out of here,
> +  * we WARN.
> +  */
> + WARN(1, "Non-mappable memory used in sg!");
> + return 1;
> + }
...

Perhaps add a flag to mark failure as 'unexpected' and trace (and panic?)
inside sg_map().

David

RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight

From: James Bottomley [mailto:james.bottom...@hansenpartnership.com]
> Sent: 28 September 2015 15:27
> On Mon, 2015-09-28 at 08:58 +, David Laight wrote:
> > From: Rafael J. Wysocki
> > > Sent: 27 September 2015 15:09
> > ...
> > > > > Say you have three adjacent fields in a structure, x, y, z, each one 
> > > > > byte long.
> > > > > Initially, all of them are equal to 0.
> > > > >
> > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time.
> > > > >
> > > > > What's the result?
> > > >
> > > > I think every CPU's  cache architecure guarantees adjacent store
> > > > integrity, even in the face of SMP, so it's x==1 and y==2.  If you're
> > > > thinking of old alpha SMP system where the lowest store width is 32 bits
> > > > and thus you have to do RMW to update a byte, this was usually fixed by
> > > > padding (assuming the structure is not packed).  However, it was such a
> > > > problem that even the later alpha chips had byte extensions.
> >
> > Does linux still support those old Alphas?
> >
> > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations.
> 
> That's different: it's an atomic RMW operation.  The problem with the
> alpha was that the operation wasn't atomic (meaning that it can't be
> interrupted and no intermediate output states are visible).

It is only atomic if prefixed by the 'lock' prefix.
Normally the read and write are separate bus cycles.
 
> > You still have to ensure the compiler doesn't do wider rmw cycles.
> > I believe the recent versions of gcc won't do wider accesses for volatile 
> > data.
> 
> I don't understand this comment.  You seem to be implying gcc would do a
> 64 bit RMW for a 32 bit store ... that would be daft when a single
> instruction exists to perform the operation on all architectures.

Read the object code and weep...
It is most likely to happen for operations that are rmw (eg bit set).
For instance the arm cpu has limited offsets for 16bit accesses, for
normal structures the compiler is likely to use a 32bit rmw sequence
for a 16bit field that has a large offset.
The C language allows the compiler to do it for any access (IIRC including
volatiles).

David

RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight

From: James Bottomley 
> Sent: 28 September 2015 16:12
> > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' 
> > > > operations.
> > >
> > > That's different: it's an atomic RMW operation.  The problem with the
> > > alpha was that the operation wasn't atomic (meaning that it can't be
> > > interrupted and no intermediate output states are visible).
> >
> > It is only atomic if prefixed by the 'lock' prefix.
> > Normally the read and write are separate bus cycles.
> 
> The essential point is that x86 has atomic bit ops and byte writes.
> Early alpha did not.

Early alpha didn't have any byte accesses.

On x86 if you have the following:
struct {
char  a;
volatile char b;
} *foo;
foo->a |= 4;

The compiler is likely to generate a 'bis #4, 0(rbx)' (or similar)
and the cpu will do two 32bit memory cycles that read and write
the 'volatile' field 'b'.
(gcc definitely used to do this...)

A lot of fields were made 32bit (and probably not bitfields) in the linux
kernel tree a year or two ago to avoid this very problem.

> > > > You still have to ensure the compiler doesn't do wider rmw cycles.
> > > > I believe the recent versions of gcc won't do wider accesses for 
> > > > volatile data.
> > >
> > > I don't understand this comment.  You seem to be implying gcc would do a
> > > 64 bit RMW for a 32 bit store ... that would be daft when a single
> > > instruction exists to perform the operation on all architectures.
> >
> > Read the object code and weep...
> > It is most likely to happen for operations that are rmw (eg bit set).
> > For instance the arm cpu has limited offsets for 16bit accesses, for
> > normal structures the compiler is likely to use a 32bit rmw sequence
> > for a 16bit field that has a large offset.
> > The C language allows the compiler to do it for any access (IIRC including
> > volatiles).
> 
> I think you might be confusing different things.  Most RISC CPUs can't
> do 32 bit store immediates because there aren't enough bits in their
> arsenal, so they tend to split 32 bit loads into a left and right part
> (first the top then the offset).  This (and other things) are mostly
> what you see in code.  However, 32 bit register stores are still atomic,
> which is all we require.  It's not really the compiler's fault, it's
> mostly an architectural limitation.

No, I'm not talking about how 32bit constants are generated.
I'm talking about structure offsets.

David

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCH V4 1/2] ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'

2015-09-28 Thread David Laight

From: Rafael J. Wysocki
> Sent: 27 September 2015 15:09
...
> > > Say you have three adjacent fields in a structure, x, y, z, each one byte 
> > > long.
> > > Initially, all of them are equal to 0.
> > >
> > > CPU A writes 1 to x and CPU B writes 2 to y at the same time.
> > >
> > > What's the result?
> >
> > I think every CPU's  cache architecure guarantees adjacent store
> > integrity, even in the face of SMP, so it's x==1 and y==2.  If you're
> > thinking of old alpha SMP system where the lowest store width is 32 bits
> > and thus you have to do RMW to update a byte, this was usually fixed by
> > padding (assuming the structure is not packed).  However, it was such a
> > problem that even the later alpha chips had byte extensions.

Does linux still support those old Alphas?

The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations.

> OK, thanks!

You still have to ensure the compiler doesn't do wider rmw cycles.
I believe the recent versions of gcc won't do wider accesses for volatile data.

David

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCH v2 12/30] cxlflash: Refine host/device attributes

2015-09-21 Thread David Laight

From: Brian King
> Sent: 18 September 2015 22:35
...
> > +   for (i = 0; i < CXLFLASH_NUM_VLUNS; i++, buf += 22)
> 
> Rather than this bug prone hard coded 22, how about never incrementing buf 
> and do something
> similar to this:
> 
> > +   bytes += scnprintf(buf, PAGE_SIZE, "%03d: %016llX\n",
> > +  i, readq_be(_port[i]));
> 
>   bytes += scnprintf([bytes], PAGE_SIZE, "%03d: %016llX\n",
>  i, readq_be(_port[i]));
...

bytes += scnprintf(buf + bytes, PAGE_SIZE - bytes, 

You need to check scnprintf()'s return value though.
The above is wrong for libc's snprintf().

David

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

RE: [PATCH v2 09/30] cxlflash: Fix to stop interrupt processing on remove

2015-09-17 Thread David Laight

From: Linuxppc-dev Matthew R. Ochs
> Sent: 16 September 2015 22:28
> Interrupt processing can run in parallel to a remove operation. This
> can lead to a condition where the interrupt handler is processing with
> memory that has been freed.
> 
> To avoid processing an interrupt while memory may be yanked, check for
> removal while in the interrupt handler. Bail when removal is imminent.

On the face of it this just reduces the size of the window somewhat.

What happens if the interrupt routine reads the flag just before it is set
(so is processing the entry that is being removed) and is then (say)
interrupted by a higher priority interrupt that takes longer to execute than
the remove code?

You've still got an interrupt routine accessing freed memory.

David

RE: [PATCH] USB: storage: add no SYNCHRONIZE CACHE quirk

2015-06-23 Thread David Laight

From: Of James Bottomley
 Sent: 22 June 2015 18:36
 To: Alan Stern
...
   Obviously, for a disk with a writeback cache that can't do flush, that
   window is much wider and the real solution should be to try to switch
   the cache to write through.

  I agree.  Doing the switch manually (by writing to the cache_type
  attribute file) works, but it's a nuisance to do this when you have a
  portable USB drive that gets moved among a bunch of machines.

 Perhaps it might be wise to do this to every USB device ... for external
 devices, the small performance gain doesn't really make up for the
 potential data loss.

What about systems running from USB memory/disk directly plugged into the
motherboard and shut inside the case?

Since they shouldn't be removed you don't want to bypass any write cache.
(Any more than you'd want to on a real disk.)

There is an additional problem caused by very temporary USB disconnects
(easily caused by electrical noise causing (probably) Vbus to bounce).
In these conditions you don't want to signal a USB remove at all - at
least not to the filesystem - until the device has remained disconnected
for a short time.

David

N�r��yb�X��ǧv�^�)޺{.n�+{{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: Large disk drives

2014-11-07 Thread David Laight

From: Norman Diamond
...
 By the way, I've seen some USB bridges that lie about whether they
 performed various SAT commands (ATA passthrough), but told the truth
 about performing an ATA IDENTIFY DEVICE through SAT.  So we could attempt
 ATA passthrough with an IDENTIFY DEVICE command, and if it happens to
 return a sane looking result, we could comparre it to the bridge's own
 translation of READ_CAPACITY_10 and READ_CAPACITY_16.  If the passthrough
 yielded a sane result and the capacity doesn't match, we know not to
 trust the bridge with 16-byte commands (except if tested, as mentioned).
 In such a case, we don't even have to look at the partition table.  Of
 course if the ATA passthrough fails or if the result is garbage then
 abandon this test and maybe look at the partition table.
...

Don't count on devices answering ATA IDENTIFY correctly either.
I had some CF cards from one of the main 'labels' that reported
an incorrectly layed out identify response, and 2 others (almost
identical) that locked solid when the request was issued.
We tried to send them back, but they were returned with a nice
shiny new MBR table.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH fix for 3.17] uas: Add a quirk for rejecting ATA_12 and ATA_16 commands

2014-09-15 Thread David Laight

From: Alan Stern
...
  p = quirks;
  while (*p) {
  @@ -543,6 +544,9 @@ void usb_stor_adjust_quirks(struct usb_device *udev, 
  unsigned long *fflags)
  case 's':
  f |= US_FL_SINGLE_LUN;
  break;
  +   case 't':
  +   f |= US_FL_NO_ATA_1X;
  +   break;
  case 'u':
  f |= US_FL_IGNORE_UAS;
  break;
 
 You must not add an aditional value for a module parameter without
 documenting it in Documentation/kernel-parameters.txt.

How can this work as a 'module parameter'?
I might want to use two different usb-scsi devices that have different
requirements.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: RES: RES: AS2105-based enclosure size issues with 2TB HDDs

2014-08-26 Thread David Laight

From: Alan Stern
 On Mon, 25 Aug 2014, Alfredo Dal Ava Junior wrote:
 
  Well, it is causing problems anyway... from user perspective, it's a
  Linux compatibility issue, as it works fine on Windows. I'm not an
  expert, but I'm wondering that if usb-storage could set capacity as
  UNDETERMINED/ Zero (or keep using the readcapacity_10 as it as with
  some flag signalizing it as inaccurate), EFI partition check would be
  able to ignore size and look for secondary GPT where it really is.
 
 Part of the problem is that usb-storage has no way to know that
 anything strange is going on.  It's normal for READ CAPACITY(16) to
 fail (this depend on the SCSI level), and it's normal for the READ
 CAPACITY(10) to report a value less than 2 TB.

Could the code try READ CAPACITY(16) first?

 Really there is only one way to know whether the actual capacity is
 larger than the reported capacity, and that is by trying to read blocks
 beyond the reported capacity -- a dangerous test that many drives do
 not like.  (And in most cases a futile test.  If a device doesn't
 support READ CAPACITY(16), how likely is it to support READ(16)?)
 
 Yes, in theory you can believe the value in the partition table in
 preference to the reported capacity.  But what if that value is wrong?
 And how do you tell partition-manager programs what the capacity should
 be when they modify or set up the initial partition table?

I've a feeling that, historically at least, windows believes the partition 
table.
I remember some CF cards that locked up when I tried to read the 'device info'
sector, and others (apparently identical) that had the 32bit sector size 
misaligned.
These were 'major manufacturer' cards as well.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: RES: RES: AS2105-based enclosure size issues with 2TB HDDs

2014-08-26 Thread David Laight

From Oliver Neukum [mailto:oneu...@suse.de]
 On Tue, 2014-08-26 at 09:58 +, David Laight wrote:
   Part of the problem is that usb-storage has no way to know that
   anything strange is going on.  It's normal for READ CAPACITY(16) to
   fail (this depend on the SCSI level), and it's normal for the READ
   CAPACITY(10) to report a value less than 2 TB.
 
  Could the code try READ CAPACITY(16) first?
 
 Yes. It does already. That fails as the device doesn't support
 this version. In a way we are discussing error handling here.

I read more of the thread later (getting outluck to sort mails in
any sensible way is almost impossible.)

I'm sort of surprised that the 16byte reads work if the 16byte
read capacity doesn't.

I wonder what the manufacturer would saw in response the bug where
windows shows the incorrect size when trying to partition the disk?

Such bugs ought to fail 'fitness for purpose' - so, in the UK, the
shop would have to replace the 'faulty' goods.

David

N�r��yb�X��ǧv�^�)޺{.n�+{{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: dummy scsi read or scsi command periodically for external USB Hard Disk

2014-07-09 Thread David Laight

From: loody
...
 but what it really do is read sector, not media_change or test_unit_ready.

Maybe one of the programs that reads the mbr partition table can
be persuaded to do a direct read?

David

RE: dummy scsi read or scsi command periodically for external USB Hard Disk

2014-07-07 Thread David Laight

From: Lars Melin
...
 sgread is not included in BusyBox but you should have touch.
 Create a dummy file on the disk and let cron touch it every 4 minutes.

You don't need 'touch' a shell redirect eg : file will do open(..., 
O_CREAT|O_TRUNC).
However that still might not force an actual disc access.

In any case you really only want to do a read, doing a write will kill the NAND 
memory.

David

N�r��yb�X��ǧv�^�)޺{.n�+{{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: [PATCH 0/10] use safer test on the result of find_first_zero_bit

2014-06-04 Thread David Laight

From: Julia Lawall
 On Wed, 4 Jun 2014, Geert Uytterhoeven wrote:
 
  Hi Julia,
 
  On Wed, Jun 4, 2014 at 11:07 AM, Julia Lawall julia.law...@lip6.fr wrote:
   Find_first_zero_bit considers BITS_PER_LONG bits at a time, and thus may
   return a larger number than the maximum position argument if that position
   is not a multiple of BITS_PER_LONG.
 
  Shouldn't this be fixed in find_first_zero_bit() instead?
 
 OK, I could do that as well.  Most of the callers currently test with =.
 Should they be left as is, or changed to use ==?

Do we want to add an extra test to find_first_zero_bit() and effectively
slow down all the calls - especially those where the length is a
multiple of 8 (probably the most common).

Maybe the documented return code should be changed to allow for the
existing behaviour.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/10] use safer test on the result of find_first_zero_bit

2014-06-04 Thread David Laight

From: Julia Lawall
 On Wed, 4 Jun 2014, Geert Uytterhoeven wrote:
 
  Hi Julia,
 
  On Wed, Jun 4, 2014 at 1:00 PM, Julia Lawall julia.law...@lip6.fr wrote:
   OK, thanks.  I was only looking at the C code.
  
   But the C code contains a loop that is followed by:
  
   if (!size)
   return result;
   tmp = *p;
  
   found_first:
   tmp |= ~0UL  size;
   if (tmp == ~0UL)/* Are any bits zero? */
   return result + size;   /* Nope. */
  
   In the first return, it would seem that result == size.  Could the second
   one be changed to just return size?  It should not hurt performance.
 
  size may have been changed between function entry and this line.
  So you have to store it in a temporary.
 
 Sorry, after reflection it seems that indeed size + result is always the
 original size, so it is actually all of the code that uses = that is
 doing something unnecessary.  == for the failure test is fine.

There is nothing wrong with defensive coding.
The 'tmp |= ~0UL  size' ensures that the return value is 'correct'
when there are no bits set.
The function could have been defined so that this wasn't needed.

If you assume that the 'no zero bits' is unlikely, then checking the
return value from ffz() could well be slightly faster.
Not that anything is likely to notice.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-22 Thread David Laight

From: walt
 On 01/21/2014 01:51 AM, David Laight wrote:
  From: Sarah Sharp
  On Mon, Jan 20, 2014 at 11:21:14AM +, David Laight wrote:
  ...
  A guess...

  In queue_bulk_sg_tx() try calling xhci_v1_0_td_remainder() instead
  of xhci_td_remainder().

 David, I tried the one-liner below, which changed nothing AFAICS, but
 then I'm not sure it's the change you intended:
...
   /* Set the TRB length, TD size, and interrupter fields. */
 - if (xhci-hci_version  0x100) {
 + if (xhci-hci_version  0x100) {
   remainder = xhci_td_remainder(
   urb-transfer_buffer_length -
   running_total);

So my wild guess wasn't right.
Can't win them all.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-21 Thread David Laight

From: Sarah Sharp
 On Mon, Jan 20, 2014 at 11:21:14AM +, David Laight wrote:
...
  A guess...
 
  In queue_bulk_sg_tx() try calling xhci_v1_0_td_remainder() instead
  of xhci_td_remainder().
 
 Why?  Walt has a 0.96 xHCI host controller, and the format for how to
 calculate the TD remainder changed between the 0.96 and the 1.0 spec.
 That's why we have xhci_v1_0_td_remainder() and xhci_td_remainder().

I just wonder how many of those differences are just differences in the
specification, rather than differences in the hardware implementation.
In some cases it might be that the old hardware just ignored the value.

I know that the xhci hardware on my ivy bridge cpu does look at that
value (at least checking for zero), since things failed in subtle ways
when I got it wrong.

In this case it was just something easy to change that might be worth
trying.  I didn't necessarily expect it to make a positive difference.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-20 Thread David Laight

From: 
 On 01/17/2014 06:34 AM, David Laight wrote:
 
  Can you try the patch I posted that stops the ownership on LINK TRBs
  being changed before that on the linked-to TRB?
 
 Sadly, the patch didn't fix the ASMedia lockup behavior, however :(
 
 I did notice that the lockup occurred only when copying *to* the usb3
 drive, and not when copying from it.  I think that may be new behavior
 but I can't swear to it.

If the behaviour has changed then the fix is on the right lines.
If reads used to lock up, and don't with the patch then that is
an improvement.
It might be that the tx issue is actually a different 'bug'.

 Just to confirm, here are the first few lines of the patch I used.
 Please let me know if it's the wrong patch:
 ...
 +
 +   field4 = (field4 amp; ~TRB_CYCLE) | ring-gt;cycle_state;
 +   if (trb == amp;ring-gt;enqueue_first-gt;generic)
 +   field4 ^= TRB_CYCLE;
 +
 trb-gt;field[0] = cpu_to_le32(field1);

That looks like my earlier 'test' patch.
The fuller patch is http://article.gmane.org/gmane.linux.usb.general/101784

There shouldn't be any difference in the way the chip is driven, the
later patch just rips out a load of code that is no longer needed.
Mostly it simplifies the way the ownership of ring entries is passed
to the hardware.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-20 Thread David Laight

From: walt 
 On 01/17/2014 06:34 AM, David Laight wrote:
 
  Can you try the patch I posted that stops the ownership on LINK TRBs
  being changed before that on the linked-to TRB?
 
 Please disregard my earlier post about the patch not applying cleanly.
 That was the usual html corruption, so I found the original on the usb
 list and it was okay.
 
 Sadly, the patch didn't fix the ASMedia lockup behavior, however :(
 
 I did notice that the lockup occurred only when copying *to* the usb3
 drive, and not when copying from it.  I think that may be new behavior
 but I can't swear to it.

Consistent with another report that says that ethernet worked provided
that TSO was disabled (ie no sg tx).
(Without the patch to delay he ownership change on link trbs it didn't
work at all.)

A guess...

In queue_bulk_sg_tx() try calling xhci_v1_0_td_remainder() instead
of xhci_td_remainder().

You can try that on top of either of my other patches.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-17 Thread David Laight

From: walt
 Oy, Sarah! ;)  I put the ASMedia adapter in my older amd64 machine, and, well,
 the stupid thing Just Works(TM) with kernel 3.12.7!  (Yes, with the same disk
 docking station, too.)
 
 I can't believe the adapter works perfectly in a different computer.  Have you
 seen this kind of thing before?

Could be a horrid timing race between the cpu and xchi controller.
If the cpu manages to write a NOP or LINK TRB for a following transfer
before the controller polls the next entry (after raising the IRQ)
then the controller might process the LINK and then get confused
when it can't process the linked-to TRB.
This might not sound likely, but PCIe has significant latency.

 At the moment I have two machines using your xhci driver and both work 
 perfectly,
 so I thank you again :)
 
 I'm not sure where to go with this next.  I could put the adapter back in the
 other machine again if you have more patches to test.

Can you try the patch I posted that stops the ownership on LINK TRBs
being changed before that on the linked-to TRB?

I got a private mail from someone indicating that my earlier 'minimal'
patch helped an ASMedia controller talking to the asx189_178a ethernet
hardware.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst [NEW HARDWARE]

2014-01-14 Thread David Laight

From: walt
 On 01/09/2014 03:50 PM, Sarah Sharp wrote:
 
  On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote:
 
  I've wondered if my xhci problems might be caused by hardware quirks, and
  wondering why I seem to be the only one who has this problem.
 
  Maybe I could take one for the team by buying new hardware toys that I
  don't really need but I could use to test the xhci driver?  (I do enjoy
  buying new toys, necessary, or, um, maybe not :)
 
  It would be appreciated if you could see if your device causes other
  host controllers to fail.  Who am I to keep a geek from new toys? ;)
 
 Sarah, I just fixed my xhci bug for US$19.99 :)
 
 #lspci | tail -1
 04:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller 
 (rev 03)

Do you know which version of the xhci spec it conforms to?
(Will probably be 0.96 or 1.00).

Of course, ASMedia will probably change the silicon they are using without
changing anything on the packaging.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-10 Thread David Laight

From: walt
  In the meantime, try this patch, which is something of a long shot.
 
 No difference.  But I notice the code enables the TRB quirk only if the
 xhci_version is specifically 0x95.  My debug messages claim that xHCI
 doesn't need link TRB QUIRK so I'm wondering if adding my asmedia device
 to the quirks list really doesn't change anything unless it's xhci 0.95.

I think the intention was that the quirk would be applied for your
hardware even though it claims to be version 0.96.
That was gust a hopeful guess that the same change would help.

 Does lspci provide that information?  I'm not seeing anything obvious.
I've only seen it in the diagnostic prints (that aren't normally output).

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-09 Thread David Laight

 From: walt
...
 I'm still wondering if I'm suffering from hardware quirks.  From the
 first day I installed my usb3 adapter card and the usb3 disk docking
 station I've noticed some quirky behavior.

Ah - this isn't an 'on chip' usb3 adapter.
Some kind of PCIe card ?

 e.g. I boot the machine with the docking station powered-off, and then
 later I power it on, the usb3 disk is not detected at all -- until I
 reboot the machine with the docking station still powered on.

That will be something completely different to failures when running.

David

N�r��yb�X��ǧv�^�)޺{.n�+{{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-08 Thread David Laight

 From: Sarah Sharp
 On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote:
  On 01/07/2014 01:21 PM, Sarah Sharp wrote:
 
   Can you please try the attached patch, on top of the previous three
   patches, and send me dmesg?
 
  Hi Sarah, I just now finished running 0001-More-debugging.patch for the
  first time.  The previous dmesg didn't include that patch, but this one
  does.
 
  I read through this dmesg but I nodded off somewhere around line 500.
  I hope you can stay awake :)
 
 Well, it has all the info I need, but the results don't make me too
 happy.  Everything I've checked seems consistent, and I don't know why
 the host stopped.  The link TRBs are intact, the dequeue pointer for the
 endpoint was pointing to the transfer that timed out and it had the
 cycle bit set correctly, etc.  Perhaps the no-op TRBs are really the
 issue.
 
 I'll have to take a look at the log again tomorrow.  I posted the dmesg
 on pastebin if David wants to check it out as well:
 http://pastebin.com/a4AUpsL1

I can't see anything obvious either.
However there is no response to the 'stop endpoint' command.
Section 4.6.9 (page 107 of rev1.0) states that the controller will complete
any USB IN or OUT transaction before raising the command completion event.
Possibly it is too 'stuck' to complete the transaction?

The endpoint status is also still '1' (running).
This also means that the 'TR dequeue pointer' is undefined - so the
controller could easily be processing a later TRB.
This field might even still contain the ring base address written by
the driver much earlier.

This might mean that something 'catastrophic' has happened earlier.
Maybe the controller isn't actually seeing any doorbell writes at all.
Maybe the base addresses it has for the rings have all got corrupted.
At least this looks like amd64 - so there aren't memory coherency issues.

Some hacks that might help isolate the problem:
1) Request an interrupt from the last nop data TRB.
2) Put a command nop (decimal 23) TRB into the command ring before
   the 'stop endpoint'.
3) Comment out the code that adds the nop data TRBs.
The first two might need code adding to handle the responses.

Do we know the actual xhci device?
I think it reports version 0x96.
(Sarah - it might be useful if that version were in one of the trace
messages that is output by default.)

David


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-08 Thread David Laight

 From: Alan Stern
 
 This may be a foolish question, but why is xhci-hcd using no-op TRBs in
 the first place?

Because it can't write in a link TRB because other parts of the
code use link TRBs to detect the end of the ring.

The problem is that it can't put a link TRB in the middle of
a chain of data fragments unless it is at a 'suitable' offset
from the start of the data TD. Given arbitrary input fragmentation
this means that you can't put a link TRB in the middle of a TD.
(The documented alignment might be as high as 16kB.)

If the rest of the code used a 'ring end pointer' then a link TRB
could be used instead.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-08 Thread David Laight

 From: Alan Stern
 On Wed, 8 Jan 2014, David Laight wrote:

   From: Alan Stern

   This may be a foolish question, but why is xhci-hcd using no-op TRBs in
   the first place?

  Because it can't write in a link TRB because other parts of the
  code use link TRBs to detect the end of the ring.

  The problem is that it can't put a link TRB in the middle of
  a chain of data fragments unless it is at a 'suitable' offset
  from the start of the data TD. Given arbitrary input fragmentation
  this means that you can't put a link TRB in the middle of a TD.
  (The documented alignment might be as high as 16kB.)

  If the rest of the code used a 'ring end pointer' then a link TRB
  could be used instead.

 I see.  Sounds like a poor design decision in hindsight.  Can it be
 changed?

Anything can be changed :-)
But it is a bit pervasive.
Padding out with nops isolated the change.

David

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-07 Thread David Laight

 From: walt
...
 Thanks Sarah.  dmesg0 is from the diagnostic patch only.  dmesg1 has all
 three patches applied.  Some of the messages in dmesg1 fell off the end of
 the kernel buffer, so I may need to make the buffer larger next time but
 I'll need a reminder of how to do it.
 
 As you suspected, the patches didn't fix the problem, sorry.

I think Sarah will want the traces of the transfer being setup (ie just
before the error). Some 'normal' completing transfers are also useful.

You might also find the full trace output in one of the files in /var/log.

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-07 Thread David Laight

The dmesg contains:

[  538.728064] EXT4-fs warning (device dm-0): ext4_end_bio:316: I/O error 
writing to inode 23330865 (offset 0 size 8388608 starting block 812628)

An 8MB transfer will need at least 128 ring entries (TRB) even if the request
is a single contiguous memory block.

Are you using the patch that increases the ring size from 64 to 256?

David

RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst

2014-01-06 Thread David Laight

 From: walt
...
 /* Accept arbitrarily long scatter-gather lists */
 -   hcd-self.sg_tablesize = ~0;
 +   hcd-self.sg_tablesize = 31;

Even if that reduces the number of fragments passed to the xhci driver
it may not be enough to limit the actual number of fragments that
need to be placed in the ring itself.
The xhci driver has to split every fragment on any 64k address boundary.

One possibility is to scan long SG lists to see it they are actually
problematic. If all the fragments are suitably aligned let them through
(without any nops).

My gut feeling is that problems only arise when the ring end isn't at
a 1k boundary in the data.

So provided all the fragments are multiples of 1k (after splitting on 64k
boundaries) the transfer will be processed correctly.
Alternatively, if the fragments are all longer than 1k (after the 64k split),
the one that crosses the ring end can be split in two.

A quick 'fix' would be to assume that anything with too many fragments is
probably ok - maybe check the first fragment is suitably aligned.
That would recover the old behaviour for usb disk transfers with a lot
of fragments - yes it is a hack...

David

RE: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-04 Thread David Laight

  It seems to me that a more useful interface would take a minimum and
  maximum number of vectors from the driver.  This wouldn't allow the
  driver to specify that it could only accept, say, any even number within
  a certain range, but you could still leave the current functions
  available for any driver that needs that.
 
 Mmmm.. I am not sure I am getting it. Could you please rephrase?

One possibility is for drivers than can use a lot of interrupts to
request a minimum number initially and then request the additional
ones much later on.
That would make it less likely that none will be available for
devices/drivers that need them but are initialised later.

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2 net-next 05/22] cxgb4: Add T5 write combining support

2013-03-14 Thread David Laight

 This patch implements a low latency Write Combining (aka Write Coalescing) 
 work
 request path. PCIE maps User Space Doorbell BAR2 region writes to the new
 interface to SGE. SGE pulls a new message from PCIE new interface and if its a
 coalesced write work request then pushes it for processing. This patch copies
 coalesced work request to memory mapped BAR2 space.
...
 + } else {
 + memset(data, 0, 2 * sizeof(u64));
 + *data += 2;
 + }

Using memset is overkill (or rather a big overhead if it isn't
detected by the compiler). Nothing wrong with:
(*data)[0] = 0;
(*data)[1] = 0;
*data += 2;
Actually, typing that, I realise that you should probably have
read *data into a local variable, and then updated it when finished.
Otherwise some of the accesses might be ones which force the
compiler to reload the value.

  static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
  {
 + unsigned int *wr, index;
 +
   wmb();/* write descriptors before telling HW */
   spin_lock(q-db_lock);
   if (!q-db_disabled) {
 - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
 -  QID(q-cntxt_id) | PIDX(n));
 + if (is_t4(adap-chip)) {
 + t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
 +  QID(q-cntxt_id) | PIDX(n));
 + } else {
 + if (n == 1) {
 + index = q-pidx ? (q-pidx - 1) : (q-size - 1);
 + wr = (unsigned int *)q-desc[index];
 + cxgb_pio_copy((u64 __iomem *)
 +   (adap-bar2 + q-udb + 64),
 +   (u64 *)wr);

Why all the casts on 'wr' ?

 + } else
 + writel(n,  adap-bar2 + q-udb + 8);
 + wmb();

Since you actually need memory barriers here on x86 you definitely
need a comment saying so, and it would (IMHO) better to use a
different define in the source (even if it is currently converted
to wmb() in a local header file).

Thinking further, for portability you might want to find some way
of abstracting the multi-word writes somehow.
For example, some of the ppc have a dma engine associated with the
PCIe master interface that can be used to generate large TLP.
The code would still want to spin waiting for the dma to complete
(since the transfer would be far faster than any interrupt path).

David



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net-next 05/22] cxgb4: Add T5 write combining support

2013-03-13 Thread David Laight

  + writel(n,  adap-bar2 + q-udb + 8);
  +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
  + asm volatile(sfence : : : memory);
  +#endif
  There is absolutely no way I'm letting anyone put crap like this
  into a driver.
 
  Use a portable inteface, and if one does not exist create one.
 
  I guess you'll have to add a wc_wmb() function for all the hw platforms
  supported by the kernel.  I see libibverbs defines this for the user
  side in include/infiniband/arch.h, and that could be used as the meat of
  the hw platform-specific implementations.
 
 I see that normal wmb() code for x86_64 architecture is same as what
 above #ifdef condition is doing. To be more clear I looked at the
 assembly code for wmb and find that it is converted into 'sfence'
 instruction. So, I think above code should be replaced with wmb call
 which will also take care of portability on different architecture. I
 will submit the series again soon.

From my recollection of the x86 architecture, the memory barriers
are hardly ever needed, certainly not in the places where, for example
a ppc needs them. I'd actually suspect that the normal wmb() for
x86 should be a nop.

About the only place where any on the fence instructions are needed
are in relation to write combining accesses.
In particular I don't believe they are ever needed to synchronise
uncached accesses with each other, or with cached accesses (which
are snooped).

David




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

39 matches

Mail list logo