Re: [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset

2018-03-07 Thread Christoph Hellwig
> + /* 256 tags should be high enough to saturate device */
> + int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);
> +
> + /* per NUMA node hw queue */
> + h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);

I don't think this magic should be in a driver.  The per-node hw_queue
selection seems like something we'd better do in the core code.

Also the whole idea to use nr_hw_queues for just partitioning tag
space on hardware that doesn't really support multiple hardware queues
seems more than odd.


Re: [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS

2018-03-07 Thread Christoph Hellwig
On Tue, Feb 27, 2018 at 06:07:46PM +0800, Ming Lei wrote:
> This patch can support to partition host-wide tags to multiple hw queues,
> so each hw queue related data structures(tags, hctx) can be accessed in
> NUMA locality way, for example, the hw queue can be per NUMA node.
> 
> It is observed IOPS can be improved much in this way on null_blk test.

null_blk isn't too interesting, so some real hardware number would
be very useful here.

Also the documentation should be a lot less sparse.  When are we going
to set this flag?  What help are we going to give driver authors to
guide chosing the option?


Re: [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'

2018-03-07 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 


Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue

2018-03-07 Thread Christoph Hellwig
> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> + const struct cpumask *mask;
> + unsigned int queue, cpu;
> +
> + for (queue = 0; queue < h->msix_vectors; queue++) {
> + mask = pci_irq_get_affinity(h->pdev, queue);
> + if (!mask)
> + goto fallback;
> +
> + for_each_cpu(cpu, mask)
> + h->reply_map[cpu] = queue;
> + }
> + return;
> +
> +fallback:
> + for_each_possible_cpu(cpu)
> + h->reply_map[cpu] = 0;
> +}

It seems a little annoying that we have to duplicate this in the driver.
Wouldn't this be solved by your force_blk_mq flag and relying on the
hw_ctx id?


,Your urgent confirmation

2018-03-07 Thread James Williams
Attn: Beneficiary,

We have contacted the Federal Ministry of Finance on your Behalf and
they have brought a solution to your problem by coordinating your
payment in total (10,000,000.00) Ten Million Dollars in an atm card
which you can use to withdraw money from any ATM MACHINE CENTER
anywhere in the world with a maximum of 1 Dollars daily. You now
have the lawful right to claim your fund in an atm card. Since the
Federal Bureau of Investigation is involved in this transaction, you
have to be rest assured for this is 100% risk free it is our duty to
protect the American Citizens, European Citizens, Asian Citizen. All I
want you to do is to contact the atm card CENTER Via email or call the
office telephone number one of the Consultant will assist you for
their requirements to proceed and procure your Approval Slip on your
behalf.

CONTACT INFORMATION
NAME: James  Williams
EMAIL: paymasterofficed...@gmail.com


Do contact us with your details:

Full name//
Address// Age//
 Telephone Numbers//
Occupation//
 Your Country//
Bank Details//

So your files would be updated after which the Delivery of your
atm card will be affected to your designated home Address without any
further delay and the bank will transfer your funds in total
(10,000,000.00) Ten Million Dollars to your Bank account. We
will reply you with the secret code (1600 atm card). We advice you get
back to the payment office after you have contacted the ATM SWIFT CARD
CENTER and we do await your response so we can move on with our
Investigation and make sure your ATM SWIFT CARD gets to you.


Best Regards
James Williams
Paymaster General
Federal Republic Of Nigeri


Re: [PATCH 2/2] m68k/amiga - Zorro ESP: new zorro_esp.c

2018-03-07 Thread Michael Schmitz
Tuomas,


Am 06.03.2018 um 04:31 schrieb Tuomas Vainikka:
>>> I think you are talking about esp->regs? For esp->dma_regs, the
>>> ioremap is
>>> conditional on ent->id, but the unmap is not.
>> The details of the ioremap are conditional on the ID, but the fact
>> that the ioremap happens (and hence esp->dma_regs is an ioremapped
>> address) is not. All Zorro-3 boards have to have both their regs and
>> dma_regs remapped.
>>
>> What's confusing is that there is only a single Zorro-3 board
>> currently supported by the driver. Others will be added and I"ll use a
>> switch statement to pick the appropriate address based on the ID. That
>> might make it clearer.
> 
> Fastlane might be the only Z3 SCSI board that has the chip.

Good to know - I've rewritten the probe code to check for the type of
board (Z2 or Z3) based on the ROM data, and make the ioremap/iounmap
conditional on that.

Cheers,

Michael


> 
> -Tuomas
> 
>>
>> +}
>> +
>> +static void zorro_esp_remove_one(struct zorro_dev *z)
>> +{
>> + struct Scsi_Host *host = zorro_get_drvdata(z);
>> + struct esp *esp = shost_priv(host);
>> +
>> + scsi_esp_unregister(esp);
>> +
>> + /* Disable interrupts. Perhaps use disable_irq instead ... */
>> +
>> + free_irq(host->irq, esp);
>> + dma_free_coherent(esp->dev, 16,
>> +   esp->command_block,
>> +   esp->command_block_dma);
>> +
>> + if (host->base > 0xff) {
>> + iounmap(esp->dma_regs);
> Do you need to test for ZORRO_PROD_PHASE5_BLIZZARD_1230_IV_1260 first?
 I can't - ent->id is not available here...
>>> Maybe store ent->id in the private struct to get around that?
>> Yes, that could be done. I still think it's not needed.
>>
>> Cheers,
>>
>>Michael
>>
>>
>>> -- 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


[Bug 198923] Linux 4.15.4+: Write on Ext4 causes system block

2018-03-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198923

Theodore Tso (ty...@mit.edu) changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #10 from Theodore Tso (ty...@mit.edu) ---
Thanks for pointing out the dup bugzilla entry.

*** This bug has been marked as a duplicate of bug 198861 ***

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 198861] Regression causes kernel OOPS and hang in SCSI error report

2018-03-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198861

Theodore Tso (ty...@mit.edu) changed:

   What|Removed |Added

 CC||m...@elbmurf.de

--- Comment #4 from Theodore Tso (ty...@mit.edu) ---
*** Bug 198923 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH v3] scsi: libsas: defer ata device eh commands to libata

2018-03-07 Thread Dan Williams
On Wed, Mar 7, 2018 at 6:34 PM, Jason Yan  wrote:
> When ata device doing EH, some commands still attached with tasks are not
> passed to libata when abort failed or recover failed, so libata did not
> handle these commands. After these commands done, sas task is freed, but
> ata qc is not freed. This will cause ata qc leak and trigger a warning
> like below:
>
> WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
> ata_eh_finish+0xb4/0xcc
> CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
> ..
> Call trace:
> [] ata_eh_finish+0xb4/0xcc
> [] ata_do_eh+0xc4/0xd8
> [] ata_std_error_handler+0x44/0x8c
> [] ata_scsi_port_error_handler+0x480/0x694
> [] async_sas_ata_eh+0x4c/0x80
> [] async_run_entry_fn+0x4c/0x170
> [] process_one_work+0x144/0x390
> [] worker_thread+0x144/0x418
> [] kthread+0x10c/0x138
> [] ret_from_fork+0x10/0x18
>
> If ata qc leaked too many, ata tag allocation will fail and io blocked
> for ever.
>
> As suggested by Dan Williams, defer ata device commands to libata and
> merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
> ata qcs correctly after this.
>
> Signed-off-by: Jason Yan 
> CC: Xiaofei Tan 
> CC: John Garry 
> CC: Dan Williams 

Looks good,

Reviewed-by: Dan Williams 


Re: [PATCH v4 01/10] dt-bindings: scsi: hisi_sas: add an property of signal attenuation

2018-03-07 Thread Rob Herring
On Wed, Mar 07, 2018 at 08:25:05PM +0800, John Garry wrote:
> From: Xiaofei Tan 
> 
> For some new boards with hip07 chipset we are required to
> set PHY config registers differently. The hw property which
> determines how to set these registers is in the PHY signal
> attenuation readings.
> 
> This patch add an devicetree property, "hisilicon,signal-attenuation",
> which is used to describe the signal attenuation of an board.
> 
> Cc: Rob Herring 
> Cc: Mark Rutland 
> Signed-off-by: Xiaofei Tan 
> Signed-off-by: John Garry 
> ---
>  Documentation/devicetree/bindings/scsi/hisilicon-sas.txt | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Rob Herring 



[PATCH v3] scsi: libsas: defer ata device eh commands to libata

2018-03-07 Thread Jason Yan
When ata device doing EH, some commands still attached with tasks are not
passed to libata when abort failed or recover failed, so libata did not
handle these commands. After these commands done, sas task is freed, but
ata qc is not freed. This will cause ata qc leak and trigger a warning
like below:

WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
..
Call trace:
[] ata_eh_finish+0xb4/0xcc
[] ata_do_eh+0xc4/0xd8
[] ata_std_error_handler+0x44/0x8c
[] ata_scsi_port_error_handler+0x480/0x694
[] async_sas_ata_eh+0x4c/0x80
[] async_run_entry_fn+0x4c/0x170
[] process_one_work+0x144/0x390
[] worker_thread+0x144/0x418
[] kthread+0x10c/0x138
[] ret_from_fork+0x10/0x18

If ata qc leaked too many, ata tag allocation will fail and io blocked
for ever.

As suggested by Dan Williams, defer ata device commands to libata and
merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
ata qcs correctly after this.

Signed-off-by: Jason Yan 
CC: Xiaofei Tan 
CC: John Garry 
CC: Dan Williams 
---
 drivers/scsi/libsas/sas_scsi_host.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 6de9681ace82..ceab5e5c41c2 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -223,6 +223,7 @@ int sas_queuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *cmd)
 static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 {
struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
+   struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_task *task = TO_SAS_TASK(cmd);
 
/* At this point, we only get called following an actual abort
@@ -231,6 +232,14 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 */
sas_end_task(cmd, task);
 
+   if (dev_is_sata(dev)) {
+   /* defer commands to libata so that libata EH can
+* handle ata qcs correctly
+*/
+   list_move_tail(>eh_entry, _ha->eh_ata_q);
+   return;
+   }
+
/* now finish the command and move it on to the error
 * handler done list, this also takes it off the
 * error handler pending list.
@@ -238,22 +247,6 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
scsi_eh_finish_cmd(cmd, _ha->eh_done_q);
 }
 
-static void sas_eh_defer_cmd(struct scsi_cmnd *cmd)
-{
-   struct domain_device *dev = cmd_to_domain_dev(cmd);
-   struct sas_ha_struct *ha = dev->port->ha;
-   struct sas_task *task = TO_SAS_TASK(cmd);
-
-   if (!dev_is_sata(dev)) {
-   sas_eh_finish_cmd(cmd);
-   return;
-   }
-
-   /* report the timeout to libata */
-   sas_end_task(cmd, task);
-   list_move_tail(>eh_entry, >eh_ata_q);
-}
-
 static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct 
scsi_cmnd *my_cmd)
 {
struct scsi_cmnd *cmd, *n;
@@ -261,7 +254,7 @@ static void sas_scsi_clear_queue_lu(struct list_head 
*error_q, struct scsi_cmnd
list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
if (cmd->device->sdev_target == my_cmd->device->sdev_target &&
cmd->device->lun == my_cmd->device->lun)
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
}
 }
 
@@ -631,12 +624,12 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
case TASK_IS_DONE:
SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
task);
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_ABORTED:
SAS_DPRINTK("%s: task 0x%p is aborted\n",
__func__, task);
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_AT_LU:
SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task);
@@ -647,7 +640,7 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
"recovered\n",
SAS_ADDR(task->dev),
cmd->device->lun);
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
sas_scsi_clear_queue_lu(work_q, cmd);
goto Again;
}
-- 
2.13.6



Re: [PATCH 2/2] m68k/amiga - Zorro ESP: new zorro_esp.c

2018-03-07 Thread Michael Schmitz
Hi Geert,

fine, I'll rely on the base address and the z->rom.er_Type value to
pick the correct config data.

Cheers,

  Michael


On Wed, Mar 7, 2018 at 9:06 PM, Geert Uytterhoeven  wrote:
> Hi Michael,
>
> On Wed, Mar 7, 2018 at 8:55 AM, Michael Schmitz  wrote:
>> OK, in that case I'll need to work out something similar to the test for
>> optional SCSI function on the Blizzard 1230/1260 to find out what board
>> I have when dealing with the duplicate Fastlane/Blizzard1230II ID.
>>
>> Is the board base address as returned by zorro_resource_start() reliable
>> to distinguish between Zorro II and Zorro III boards?
>
> The board base address is assigned by AmigaOS based on the values in the
> Expansion ROM (mainly ExpansionRom.er_Type) on the board.
> More specifically, AmigaOS creates struct ConfigDev from struct ExpansionRom.
> So yes, it must be reliable.
>
>> Am 06.03.2018 um 20:43 schrieb Geert Uytterhoeven:
>>> On Tue, Mar 6, 2018 at 2:11 AM, Michael Schmitz  
>>> wrote:
 Index 1 should have been ZORRO_PROD_PHASE5_CYBERSTORM_MK_II, I've
 corrected that in the meantime.

 Fastlane / Blizzard 1230_II distinction is something I an not quite
 sure about - does the probe function get called twice if the device
 table contains the same ID twice but with different driver_data
 contents?
>>>
>>> No, the probe function gets called on the first match only.
>>> Cfr. drivers/zorro/zorro-driver.c:zorro_device_probe().
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-03-07 Thread Ming Lei
On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > >
> > > Also one observation using V3 series patch. I am seeing below Affinity
> > > mapping whereas I have only 72 logical CPUs.  It means we are really
> > > not going to use all reply queues.
> > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue
> > > is used and that may lead to performance drop as well.
> >
> > If the mapping is in such shape, I guess it should be quite difficult to
> figure out
> > one perfect way to solve this situation because one reply queue has to
> handle
> > IOs submitted from 4~5 CPUs at average.
> 
> 4.15.0-rc1 kernel has below mapping - I am not sure which commit id in "
> linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.  It

I guess the mapping you posted is read from /proc/irq/126/smp_affinity.

If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change IRQ
affinity code, which is done in irq_create_affinity_masks(), as you saw, no any
patch in linux_4.16-rc-host-tags-v3.2 touches that code.

Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2 against
4.15-rc1 kernel and see any difference?

> will be really good if we can fall back to below mapping once again.
> Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random mapping
> of CPU - MSIx. And that will be problematic in performance run.
> 
> As I posted earlier, latest repo will only allow us to use *18* reply

Looks not see this report before, could you share us how you conclude that?
The only patch changing reply queue is the following one:

https://marc.info/?l=linux-block=151972611911593=2

But not see any issue in this patch yet, can you recover to 72 reply
queues after reverting the patch in above link?

> queue instead of *72*.  Lots of performance related issue can be pop up on
> different setup due to inconsistency in CPU - MSIx mapping. BTW, changes
> in this area is intentional @" linux_4.16-rc-host-tags-v3.2". ?

As you mentioned in the following link, you didn't see big performance drop
with linux_4.16-rc-host-tags-v3.2, right?

https://marc.info/?l=linux-block=151982993810092=2


Thanks,
Ming


[PATCH] qla2xxx: Remove FC_NO_LOOP_ID for FCP and FC-NVMe Discovery

2018-03-07 Thread Himanshu Madhani
Commit 7d64c39e64310 fixed regression of FCP discovery when
Nport Handle is in-use and relogin is triggered. However,
during FCP and FC-NVMe discovery this resulted into only
discovering NVMe LUNs.

This patch fixes issue where FCP and FC-NVMe protocol is used on
same port where assigning FC_NO_LOOP_ID will result into discovery
failure for FCP LUNs.

Fixes: a084fd68e1d26 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
Signed-off-by: Himanshu Madhani 
---
Hi Martin, 

This patch fixes the FCP discovery on same port with FC-NVMe enabled. 

Please apply to 4.16/scsi-fixes at your earliest convenience.

Thanks,
Himanshu 
---
 drivers/scsi/qla2xxx/qla_init.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index c4b04e807e12..cb182b102bfd 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -1719,7 +1719,6 @@ qla24xx_handle_plogi_done_event(struct scsi_qla_host 
*vha, struct event_arg *ea)
 
set_bit(ea->fcport->loop_id, vha->hw->loop_id_map);
spin_lock_irqsave(>hw->tgt.sess_lock, flags);
-   ea->fcport->loop_id = FC_NO_LOOP_ID;
ea->fcport->chip_reset = 
vha->hw->base_qpair->chip_reset;
ea->fcport->logout_on_delete = 1;
ea->fcport->send_els_logo = 0;
-- 
2.12.0



Re: [PATCH 6/6] lib/scatterlist: Drop order argument from sgl_free_n_order

2018-03-07 Thread James Bottomley
On Wed, 2018-03-07 at 12:47 +, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 

Firstly, I don't see any justifiable benefit to churning this API, so
why bother? but secondly this:

> We can derive the order from sg->length and so do not need to pass it
> in explicitly.

Is wrong.  I can have a length 2 scatterlist that crosses a page
boundary, but I can also have one within a single page, so the order
cannot be deduced from the length.

James



Re: [PATCH 6/6] lib/scatterlist: Drop order argument from sgl_free_n_order

2018-03-07 Thread Bart Van Assche
On Wed, 2018-03-07 at 17:23 +, Tvrtko Ursulin wrote:
> Ok I guess my main questions are the ones from the cover letter - where 
> is this API going and why did it get in a bit of a funky state? Because 
> it doesn't look fully thought through and tested to me.

Funky state? Not fully tested? Except for the error paths and upper length
limits the sgl allocation and freeing functions have been tested thoroughly.

> My motivation is that I would like to extend it to add 
> sgl_alloc_order_min_max, which takes min order and max order, and 
> allocates as large chunks as it can given those constraints. This is 
> something we have in i915 and could then drop our implementation and use 
> the library function.

That sounds useful to me.

> But I also wanted to refactor sgl_alloc_order to benefit from the 
> existing chained struct scatterlist allocator. But SGL API does not 
> embed into struct sg_table, neither it carries explicitly the number of 
> nents allocated, making it impossible to correctly free with existing 
> sg_free_table.

It is on purpose that sgl_alloc_order() returns a struct scatterlist
instead of any structure built on top of struct scatterlist. If you have
a look at the sgl_alloc*() callers then you will see that nontrivial
changes in these callers are required to make them use something else than
a struct scatterlist pointer. But if you would like to rework those callers
that's fine with me. I can help with reviewing the code I'm familiar with.

> Also I am not sure if a single gfp argument to sgl_alloc_order is the 
> right thing to do. I have a feeling you either need to ignore it for 
> kmalloc_array, or pass in two gfp_t arguments to be used for metadata 
> and backing storage respectively.

If there is a caller that needs this feel free to make this change. But
please don't make this change before there is a caller that needs it.

Thanks,

Bart.




RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-03-07 Thread Kashyap Desai
> >
> > Also one observation using V3 series patch. I am seeing below Affinity
> > mapping whereas I have only 72 logical CPUs.  It means we are really
> > not going to use all reply queues.
> > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue
> > is used and that may lead to performance drop as well.
>
> If the mapping is in such shape, I guess it should be quite difficult to
figure out
> one perfect way to solve this situation because one reply queue has to
handle
> IOs submitted from 4~5 CPUs at average.

4.15.0-rc1 kernel has below mapping - I am not sure which commit id in "
linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.  It
will be really good if we can fall back to below mapping once again.
Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random mapping
of CPU - MSIx. And that will be problematic in performance run.

As I posted earlier, latest repo will only allow us to use *18* reply
queue instead of *72*.  Lots of performance related issue can be pop up on
different setup due to inconsistency in CPU - MSIx mapping. BTW, changes
in this area is intentional @" linux_4.16-rc-host-tags-v3.2". ?

irq 218, cpu list 0
irq 219, cpu list 1
irq 220, cpu list 2
irq 221, cpu list 3
irq 222, cpu list 4
irq 223, cpu list 5
irq 224, cpu list 6
irq 225, cpu list 7
irq 226, cpu list 8
irq 227, cpu list 9
irq 228, cpu list 10
irq 229, cpu list 11
irq 230, cpu list 12
irq 231, cpu list 13
irq 232, cpu list 14
irq 233, cpu list 15
irq 234, cpu list 16
irq 235, cpu list 17
irq 236, cpu list 36
irq 237, cpu list 37
irq 238, cpu list 38
irq 239, cpu list 39
irq 240, cpu list 40
irq 241, cpu list 41
irq 242, cpu list 42
irq 243, cpu list 43
irq 244, cpu list 44
irq 245, cpu list 45
irq 246, cpu list 46
irq 247, cpu list 47
irq 248, cpu list 48
irq 249, cpu list 49
irq 250, cpu list 50
irq 251, cpu list 51
irq 252, cpu list 52
irq 253, cpu list 53
irq 254, cpu list 18
irq 255, cpu list 19
irq 256, cpu list 20
irq 257, cpu list 21
irq 258, cpu list 22
irq 259, cpu list 23
irq 260, cpu list 24
irq 261, cpu list 25
irq 262, cpu list 26
irq 263, cpu list 27
irq 264, cpu list 28
irq 265, cpu list 29
irq 266, cpu list 30
irq 267, cpu list 31
irq 268, cpu list 32
irq 269, cpu list 33
irq 270, cpu list 34
irq 271, cpu list 35
irq 272, cpu list 54
irq 273, cpu list 55
irq 274, cpu list 56
irq 275, cpu list 57
irq 276, cpu list 58
irq 277, cpu list 59
irq 278, cpu list 60
irq 279, cpu list 61
irq 280, cpu list 62
irq 281, cpu list 63
irq 282, cpu list 64
irq 283, cpu list 65
irq 284, cpu list 66
irq 285, cpu list 67
irq 286, cpu list 68
irq 287, cpu list 69
irq 288, cpu list 70
irq 289, cpu list 71


>
> The application should have the knowledge to avoid this kind of usage.
>
>
> Thanks,
> Ming


Re: [PATCH 6/6] lib/scatterlist: Drop order argument from sgl_free_n_order

2018-03-07 Thread Tvrtko Ursulin


On 07/03/18 16:23, Bart Van Assche wrote:

On Wed, 2018-03-07 at 12:47 +, Tvrtko Ursulin wrote:

We can derive the order from sg->length and so do not need to pass it in
explicitly. Rename the function to sgl_free_n.


Using get_order() to compute the order looks fine to me but this patch will
have to rebased in order to address the comments on the previous patches.


Ok I guess my main questions are the ones from the cover letter - where 
is this API going and why did it get in a bit of a funky state? Because 
it doesn't look fully thought through and tested to me.


My motivation is that I would like to extend it to add 
sgl_alloc_order_min_max, which takes min order and max order, and 
allocates as large chunks as it can given those constraints. This is 
something we have in i915 and could then drop our implementation and use 
the library function.


But I also wanted to refactor sgl_alloc_order to benefit from the 
existing chained struct scatterlist allocator. But SGL API does not 
embed into struct sg_table, neither it carries explicitly the number of 
nents allocated, making it impossible to correctly free with existing 
sg_free_table.


Another benefit of using the existing sg allocator would be that for 
large allocation you don't depend on the availability of contiguous 
chunks like you do with kmalloc_array.


For instance if in another reply you mentioned 4GiB allocations are a 
possibility. If you use order 0 that means you need 1M nents, which can 
be something like 32 bytes each and you need a 32MiB kmalloc for the 
nents array and thats quite big. If you would be able to reuse the 
existing sg_alloc_table infrastructure (I have patches which extract it 
if you don't want to deal with struct sg_table), you would benefit from 
PAGE_SIZE allocations.


Also I am not sure if a single gfp argument to sgl_alloc_order is the 
right thing to do. I have a feeling you either need to ignore it for 
kmalloc_array, or pass in two gfp_t arguments to be used for metadata 
and backing storage respectively.


So I have many questions regarding the current state and future 
direction, but essentially would like to make it usable for other 
drivers, like i915, as well.


Regards,

Tvrtko


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Douglas Gilbert

On 2018-03-07 09:02 AM, Menion wrote:

2018-03-07 14:51 GMT+01:00 Steffen Maier <ma...@linux.vnet.ibm.com>:


On 03/07/2018 09:24 AM, Menion wrote:



...

but from then on, you only get it roughly once every 300 seconds, i.e. 5
minutes

that's where I suspect user space as trigger, unless there is a kernel
feature I'm not aware of doing such sdev rescans

preventing this would be a workaround



Is it possible that it is smartd? It is the only daemon that could do
some low level access to the device (bypassing the filesystem)


If you wait about 5 hours from the time of this post then go to the
smartmontools mirror at:
  https://github.com/mirror/smartmontools

To check it is the revision (svn rev >= 4718) you need for this fix, look
at the top of the ChangeLog file and look for today's date (20180307).
Assuming it is there, clone it then try to build smartmontools
( './autogen.sh ; ./configure ; make install') and try the new smartd ***.
You should get one warning per 8 TB device (for each run of smartd) and no
more.

Currently smartmontools only has a quirks database (and it is large)
for ATA devices, not real or pseudo SCSI device, nor NVMe devices (yet).
Hopefully this fix will be sufficient.

If it does not work, please send me the details.

Doug Gilbert


*** without a --prefix=/usr/sbin or similar option to .configure I think
that smartd will be placed in /usr/local/sbin which may be different
from where your distro places it. Your PATH will determine which one
is used.





 /*
  * Many devices do not respond properly to
READ_CAPACITY_16.
  * Tell the SCSI layer to try READ_CAPACITY_10 first.
  * However some USB 3.0 drive enclosures return capacity
  * modulo 2TB. Those must use READ_CAPACITY_16
  */
 if (!(us->fflags & US_FL_NEEDS_CAP16))
 sdev->try_rc_10_first = 1;



if that's the cause, maybe an entry in drivers/usb/storage/unusual_devs.h
would help, but that's really just guessing as I'm not familiar with USB



It seems that the bridge does have an entry in unusual_devs.h:

/* Reported by Michael Büsch <m...@bues.ch> */
UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),

VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
I went back and used another enclosure with same USB to SATA bridge.
The strange thing is that this other enclosure goes in UAS mode while
the one for which I am reporting the issue goes in usb-storage mode
because it gets somehow the quirks 0x5000
Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
you think that it shall be reported to linux-usb maybe?





[Bug 198861] Regression causes kernel OOPS and hang in SCSI error report

2018-03-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198861

Bart Van Assche (bvanass...@acm.org) changed:

   What|Removed |Added

 CC||bvanass...@acm.org

--- Comment #3 from Bart Van Assche (bvanass...@acm.org) ---
This has been fixed by the following commit, which is in James' tree but not
yet in Linus' tree nor any of the stable trees: 3be8828fc507 ("scsi: core:
Avoid that ATA error handling can trigger a kernel hang or oops"). See also
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git/commit/?h=fixes=3be8828fc507cdafe7040a3dcf361a2bcd8e305b.
A pull request that includes that commit has been sent to Linus. See also
https://lkml.org/lkml/2018/3/6/767.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 198923] Linux 4.15.4+: Write on Ext4 causes system block

2018-03-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198923

Bart Van Assche (bvanass...@acm.org) changed:

   What|Removed |Added

 CC||bvanass...@acm.org

--- Comment #9 from Bart Van Assche (bvanass...@acm.org) ---
Please close this bugzilla entry as a duplicate of #198861.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 198923] Linux 4.15.4+: Write on Ext4 causes system block

2018-03-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198923

Theodore Tso (ty...@mit.edu) changed:

   What|Removed |Added

 Status|REOPENED|ASSIGNED
  Component|ext4|Other
   Assignee|fs_e...@kernel-bugs.osdl.or |scsi_drivers-other@kernel-b
   |g   |ugs.osdl.org
Product|File System |SCSI Drivers

--- Comment #8 from Theodore Tso (ty...@mit.edu) ---
The people you need to contact are on the linux-scsi@vger.kernel.org mailing
list.  They are extremely unlikely to be following this bugzilla issue.  You'll
need to ping the mailing list thread on the linux-scsi mailing list.   Or
contact the two SCSI maintainers:

SCSI SUBSYSTEM
M:  "James E.J. Bottomley" 
T:  git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git
M:  "Martin K. Petersen" 
T:  git git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git

Or try contacting the patch author in question, Bart Van Assche
.   There's not much we (the ext4 developers) can do
here to help.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


Re: [PATCH v2] scsi: libsas: defer ata device eh commands to libata

2018-03-07 Thread John Garry

On 07/03/2018 09:18, Jack Wang wrote:

2018-03-07 8:47 GMT+01:00 Jason Yan :

When ata device doing EH, some commands still attached with tasks are not
passed to libata when abort failed or recover failed, so libata did not
handle these commands. After these commands done, sas task is freed, but
ata qc is not freed. This will cause ata qc leak and trigger a warning
like below:

WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
..
Call trace:
[] ata_eh_finish+0xb4/0xcc
[] ata_do_eh+0xc4/0xd8
[] ata_std_error_handler+0x44/0x8c
[] ata_scsi_port_error_handler+0x480/0x694
[] async_sas_ata_eh+0x4c/0x80
[] async_run_entry_fn+0x4c/0x170
[] process_one_work+0x144/0x390
[] worker_thread+0x144/0x418
[] kthread+0x10c/0x138
[] ret_from_fork+0x10/0x18

If ata qc leaked too many, ata tag allocation will fail and io blocked
for ever.

As suggested by Dan Williams, defer ata device commands to libata and
merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
ata qcs correctly after this.

Signed-off-by: Jason Yan 
CC: Xiaofei Tan 
CC: John Garry 
CC: Dan Williams 

Hi Jason,

Would be good if you add the Fixes tag, so distribution could pick it up easily.
And wasn't the idea to delete sas_eh_finish_cmd instead of sas_eh_defer_cmd?


The problem was that we need to defer any ATA command. So the solution 
was to just delete sas_eh_defer_cmd() and make sas_eh_finish_cmd() do 
what needs to be done for either ATA or non-ATA task.


Thanks,
John




Thanks,
Jack

---
 drivers/scsi/libsas/sas_scsi_host.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 6de9681ace82..fd76436b289c 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -223,6 +223,7 @@ int sas_queuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *cmd)
 static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 {
struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
+   struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_task *task = TO_SAS_TASK(cmd);

/* At this point, we only get called following an actual abort
@@ -231,6 +232,11 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 */
sas_end_task(cmd, task);

+   if (dev_is_sata(dev)) {
+   list_move_tail(>eh_entry, _ha->eh_ata_q);
+   return;
+   }
+
/* now finish the command and move it on to the error
 * handler done list, this also takes it off the
 * error handler pending list.
@@ -238,22 +244,6 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
scsi_eh_finish_cmd(cmd, _ha->eh_done_q);
 }

-static void sas_eh_defer_cmd(struct scsi_cmnd *cmd)
-{
-   struct domain_device *dev = cmd_to_domain_dev(cmd);
-   struct sas_ha_struct *ha = dev->port->ha;
-   struct sas_task *task = TO_SAS_TASK(cmd);
-
-   if (!dev_is_sata(dev)) {
-   sas_eh_finish_cmd(cmd);
-   return;
-   }
-
-   /* report the timeout to libata */
-   sas_end_task(cmd, task);
-   list_move_tail(>eh_entry, >eh_ata_q);
-}
-
 static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct 
scsi_cmnd *my_cmd)
 {
struct scsi_cmnd *cmd, *n;
@@ -261,7 +251,7 @@ static void sas_scsi_clear_queue_lu(struct list_head 
*error_q, struct scsi_cmnd
list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
if (cmd->device->sdev_target == my_cmd->device->sdev_target &&
cmd->device->lun == my_cmd->device->lun)
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
}
 }

@@ -631,12 +621,12 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
case TASK_IS_DONE:
SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
task);
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_ABORTED:
SAS_DPRINTK("%s: task 0x%p is aborted\n",
__func__, task);
-   sas_eh_defer_cmd(cmd);
+   sas_eh_finish_cmd(cmd);
continue;
case TASK_IS_AT_LU:
SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task);
@@ -647,7 +637,7 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host 
*shost, struct list_head *
"recovered\n",
SAS_ADDR(task->dev),
 

Re: [PATCH 6/6] lib/scatterlist: Drop order argument from sgl_free_n_order

2018-03-07 Thread Bart Van Assche
On Wed, 2018-03-07 at 12:47 +, Tvrtko Ursulin wrote:
> We can derive the order from sg->length and so do not need to pass it in
> explicitly. Rename the function to sgl_free_n.

Using get_order() to compute the order looks fine to me but this patch will
have to rebased in order to address the comments on the previous patches.

Thanks,

Bart.




Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-03-07 Thread Ming Lei
On Wed, Mar 07, 2018 at 08:31:31PM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Ming Lei [mailto:ming@redhat.com]
> > Sent: Wednesday, March 7, 2018 10:58 AM
> > To: Kashyap Desai
> > Cc: Jens Axboe; linux-bl...@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
> > Rivera; Laurence Oberman
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> > .host_tagset
> >
> > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > Ming -
> > >
> > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it goes
> > > to 1544K) I confirm that after applying this patch, we have #queue =
> #numa
> > node.
> > >
> > > ls -l
> > >
> >
> /sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2:23/10:
> > > 2:23:0/block/sdy/mq
> > > total 0
> > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > Feb 28 09:53 1
> > >
> > >
> > > I would suggest to skip megaraid_sas driver changes using
> > > shared_tagset until and unless there is obvious gain. If overall
> > > interface of using shared_tagset is commit in kernel tree, we will
> > > investigate (megaraid_sas
> > > driver) in future about real benefit of using it.
> >
> > Hi Kashyap,
> >
> > Now I have put patches for removing operating on scsi_host->host_busy in
> > V4[1], especially which are done in the following 3 patches:
> >
> > 9221638b9bc9 scsi: avoid to hold host_busy for scsi_mq
> > 1ffc8c0ffbe4 scsi: read host_busy via scsi_host_busy()
> > e453d3983243 scsi: introduce scsi_host_busy()
> >
> >
> > Could you run your test on V4 and see if IOPS can be improved on
> > megaraid_sas?
> >
> >
> > [1] https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4
> 
> I will be doing testing soon.

Today I revisit your previous perf trace too, seems the following samples take
a bit more CPU:

   4.64%  [megaraid_sas]   [k] complete_cmd_fusion
   ...
   2.22%  [megaraid_sas]   [k] megasas_build_io_fusion
   ...
   1.33%  [megaraid_sas]   [k] megasas_build_and_issue_cmd_fusion

But V4 should get a bit improvement in theory.

And if some host-wide resource of megaraid_sas can be partitioned to
per-node hw queue, I guess some of improvement can be got too.

> 
> BTW - Performance impact is due below patch only -
> "[PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
> .host_tagset"
> 
> Below patch is really needed -
> "[PATCH V3 2/8] scsi: megaraid_sas: fix selection of reply queue"
> 
> I am currently doing review on my setup.  I think above patch is fixing
> real issue of performance (for megaraid_sas) as driver may not be sending
> IO to optimal reply queue.

The ideal way is to map reply queue to blk-mq's hw queue, but seems
SCSI/driver's IO path is too slow so that high enough hw queue
depth(from device internal view, for example 256) still can't reach good
performance, as you observed.

> Having CPU to MSIx mapping will solve that. Megaraid_sas driver always
> create max MSIx as min (online CPU, # MSIx HW support).
> I will do more review and testing for that particular patch as well.

OK, thanks!

> 
> Also one observation using V3 series patch. I am seeing below Affinity
> mapping whereas I have only 72 logical CPUs.  It means we are really not
> going to use all reply queues.
> e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue is
> used and that may lead to performance drop as well.

If the mapping is in such shape, I guess it should be quite difficult to
figure out one perfect way to solve this situation because one reply
queue has to handle IOs submitted from 4~5 CPUs at average.

The application should have the knowledge to avoid this kind of usage.


Thanks,
Ming


Re: [PATCH v2] scsi: libsas: defer ata device eh commands to libata

2018-03-07 Thread Dan Williams
On Tue, Mar 6, 2018 at 11:47 PM, Jason Yan  wrote:
> When ata device doing EH, some commands still attached with tasks are not
> passed to libata when abort failed or recover failed, so libata did not
> handle these commands. After these commands done, sas task is freed, but
> ata qc is not freed. This will cause ata qc leak and trigger a warning
> like below:
>
> WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
> ata_eh_finish+0xb4/0xcc
> CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W  OE 4.14.0#1
> ..
> Call trace:
> [] ata_eh_finish+0xb4/0xcc
> [] ata_do_eh+0xc4/0xd8
> [] ata_std_error_handler+0x44/0x8c
> [] ata_scsi_port_error_handler+0x480/0x694
> [] async_sas_ata_eh+0x4c/0x80
> [] async_run_entry_fn+0x4c/0x170
> [] process_one_work+0x144/0x390
> [] worker_thread+0x144/0x418
> [] kthread+0x10c/0x138
> [] ret_from_fork+0x10/0x18
>
> If ata qc leaked too many, ata tag allocation will fail and io blocked
> for ever.
>
> As suggested by Dan Williams, defer ata device commands to libata and
> merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
> ata qcs correctly after this.
>
> Signed-off-by: Jason Yan 
> CC: Xiaofei Tan 
> CC: John Garry 
> CC: Dan Williams 
> ---
>  drivers/scsi/libsas/sas_scsi_host.c | 30 ++
>  1 file changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
> b/drivers/scsi/libsas/sas_scsi_host.c
> index 6de9681ace82..fd76436b289c 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -223,6 +223,7 @@ int sas_queuecommand(struct Scsi_Host *host, struct 
> scsi_cmnd *cmd)
>  static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
>  {
> struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
> +   struct domain_device *dev = cmd_to_domain_dev(cmd);
> struct sas_task *task = TO_SAS_TASK(cmd);
>
> /* At this point, we only get called following an actual abort
> @@ -231,6 +232,11 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
>  */
> sas_end_task(cmd, task);

This patch looks good, just need a comment here about why it is ok to
short circuit this routine in the dev_is_sata() case.

>
> +   if (dev_is_sata(dev)) {
> +   list_move_tail(>eh_entry, _ha->eh_ata_q);
> +   return;
> +   }
> +


RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset

2018-03-07 Thread Kashyap Desai
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Wednesday, March 7, 2018 10:58 AM
> To: Kashyap Desai
> Cc: Jens Axboe; linux-bl...@vger.kernel.org; Christoph Hellwig; Mike
Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Peter
> Rivera; Laurence Oberman
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
via
> .host_tagset
>
> On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > Ming -
> >
> > Quick testing on my setup -  Performance slightly degraded (4-5%
> > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it goes
> > to 1544K) I confirm that after applying this patch, we have #queue =
#numa
> node.
> >
> > ls -l
> >
>
/sys/devices/pci:80/:80:02.0/:83:00.0/host10/target10:2:23/10:
> > 2:23:0/block/sdy/mq
> > total 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > Feb 28 09:53 1
> >
> >
> > I would suggest to skip megaraid_sas driver changes using
> > shared_tagset until and unless there is obvious gain. If overall
> > interface of using shared_tagset is commit in kernel tree, we will
> > investigate (megaraid_sas
> > driver) in future about real benefit of using it.
>
> Hi Kashyap,
>
> Now I have put patches for removing operating on scsi_host->host_busy in
> V4[1], especially which are done in the following 3 patches:
>
>   9221638b9bc9 scsi: avoid to hold host_busy for scsi_mq
>   1ffc8c0ffbe4 scsi: read host_busy via scsi_host_busy()
>   e453d3983243 scsi: introduce scsi_host_busy()
>
>
> Could you run your test on V4 and see if IOPS can be improved on
> megaraid_sas?
>
>
> [1] https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4

I will be doing testing soon.

BTW - Performance impact is due below patch only -
"[PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
.host_tagset"

Below patch is really needed -
"[PATCH V3 2/8] scsi: megaraid_sas: fix selection of reply queue"

I am currently doing review on my setup.  I think above patch is fixing
real issue of performance (for megaraid_sas) as driver may not be sending
IO to optimal reply queue.
Having CPU to MSIx mapping will solve that. Megaraid_sas driver always
create max MSIx as min (online CPU, # MSIx HW support).
I will do more review and testing for that particular patch as well.

Also one observation using V3 series patch. I am seeing below Affinity
mapping whereas I have only 72 logical CPUs.  It means we are really not
going to use all reply queues.
e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue is
used and that may lead to performance drop as well.

PCI name is 86:00.0, dump its irq affinity:
irq 218, cpu list 0-2,36-37
irq 219, cpu list 3-5,39-40
irq 220, cpu list 6-8,42-43
irq 221, cpu list 9-11,45-46
irq 222, cpu list 12-13,48-49
irq 223, cpu list 14-15,50-51
irq 224, cpu list 16-17,52-53
irq 225, cpu list 38,41,44,47
irq 226, cpu list 72,74,76,78
irq 227, cpu list 80,82,84,86
irq 228, cpu list 88,90,92,94
irq 229, cpu list 96,98,100,102
irq 230, cpu list 104,106,108,110
irq 231, cpu list 112,114,116,118
irq 232, cpu list 120,122,124,126
irq 233, cpu list 128,130,132,134
irq 234, cpu list 136,138,140,142
irq 235, cpu list 144,146,148,150
irq 236, cpu list 152,154,156,158
irq 237, cpu list 160,162,164,166
irq 238, cpu list 168,170,172,174
irq 239, cpu list 176,178,180,182
irq 240, cpu list 184,186,188,190
irq 241, cpu list 192,194,196,198
irq 242, cpu list 200,202,204,206
irq 243, cpu list 208,210,212,214
irq 244, cpu list 216,218,220,222
irq 245, cpu list 224,226,228,230
irq 246, cpu list 232,234,236,238
irq 247, cpu list 240,242,244,246
irq 248, cpu list 248,250,252,254
irq 249, cpu list 256,258,260,262
irq 250, cpu list 264,266,268,270
irq 251, cpu list 272,274,276,278
irq 252, cpu list 280,282,284,286
irq 253, cpu list 288,290,292,294
irq 254, cpu list 18-20,54-55
irq 255, cpu list 21-23,57-58
irq 256, cpu list 24-26,60-61
irq 257, cpu list 27-29,63-64
irq 258, cpu list 30-31,66-67
irq 259, cpu list 32-33,68-69
irq 260, cpu list 34-35,70-71
irq 261, cpu list 56,59,62,65
irq 262, cpu list 73,75,77,79
irq 263, cpu list 81,83,85,87
irq 264, cpu list 89,91,93,95
irq 265, cpu list 97,99,101,103
irq 266, cpu list 105,107,109,111
irq 267, cpu list 113,115,117,119
irq 268, cpu list 121,123,125,127
irq 269, cpu list 129,131,133,135
irq 270, cpu list 137,139,141,143
irq 271, cpu list 145,147,149,151
irq 272, cpu list 153,155,157,159
irq 273, cpu list 161,163,165,167
irq 274, cpu list 169,171,173,175
irq 275, cpu list 177,179,181,183
irq 276, cpu list 185,187,189,191
irq 277, cpu list 193,195,197,199
irq 278, cpu list 201,203,205,207
irq 279, cpu list 209,211,213,215
irq 280, cpu list 217,219,221,223
irq 281, cpu list 225,227,229,231
irq 282, cpu list 233,235,237,239
irq 283, cpu list 241,243,245,247
irq 284, cpu list 249,251,253,255
irq 285, cpu list 

Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue

2018-03-07 Thread Laurence Oberman
On Tue, 2018-03-06 at 14:24 -0500, Martin K. Petersen wrote:
> Ming,
> 
> > Given both Don and Laurence have verified that patch 1 and patch 2
> > does fix IO hang, could you consider to merge the two first?
> 
> Oh, and I would still need a formal Acked-by: from Don and Tested-by:
> from Laurence.
> 
> Also, for 4.16/scsi-fixes I would prefer verification to be done with
> just patch 1/8 and none of the subsequent changes in place. Just to
> make
> sure we're testing the right thing.
> 
> Thanks!
> 

Hello Martin

I tested just Patch 1/8 from the V3 series.
No issues running workload and no issues booting on the DL380G7.
Don can you ack this so we can at least get this one in.

Against: 4.16.0-rc4.v31of8+ on an x86_64

Tested-by: Laurence Oberman 

Thanks
Laurence


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Menion
2018-03-07 14:51 GMT+01:00 Steffen Maier :
>
> On 03/07/2018 09:24 AM, Menion wrote:
>>
> ...
>
> but from then on, you only get it roughly once every 300 seconds, i.e. 5
> minutes
>
> that's where I suspect user space as trigger, unless there is a kernel
> feature I'm not aware of doing such sdev rescans
>
> preventing this would be a workaround
>

Is it possible that it is smartd? It is the only daemon that could do
some low level access to the device (bypassing the filesystem)

>
>> /*
>>  * Many devices do not respond properly to
>> READ_CAPACITY_16.
>>  * Tell the SCSI layer to try READ_CAPACITY_10 first.
>>  * However some USB 3.0 drive enclosures return capacity
>>  * modulo 2TB. Those must use READ_CAPACITY_16
>>  */
>> if (!(us->fflags & US_FL_NEEDS_CAP16))
>> sdev->try_rc_10_first = 1;
>
>
> if that's the cause, maybe an entry in drivers/usb/storage/unusual_devs.h
> would help, but that's really just guessing as I'm not familiar with USB
>

It seems that the bridge does have an entry in unusual_devs.h:

/* Reported by Michael Büsch  */
UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),

VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
I went back and used another enclosure with same USB to SATA bridge.
The strange thing is that this other enclosure goes in UAS mode while
the one for which I am reporting the issue goes in usb-storage mode
because it gets somehow the quirks 0x5000
Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
you think that it shall be reported to linux-usb maybe?


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Steffen Maier


On 03/07/2018 09:24 AM, Menion wrote:

By flooded I mean that it continously fill the dmesg log with no
interruption, check attached a log that I have just taken from my
server
Some more details on my setup. I have these 5 HDD, WD RED 8TB in an
Orico 5 bay enclosure, running JMS567 USBtoSATA bridge and an internal
SATA multiplexer
This is connected to the USB 3.0 host port of my server, it is an Intel Atom



2018-03-07 3:45 GMT+01:00 Martin K. Petersen :

Also, what kind of controller are these disks attached to? The reason
you see these messages is that to the kernel it looks like a legacy disk
device that predates capacities in the TB range. The warnings are logged
because we're surprised to be going down this path based on what the
device has previously told us.


Of course Martin's statement regarding the occurrence holds true.

It does not look like continuously flooding, but rather like a 
repetition at some not even high frequency. Do you have some user space 
periodically performing SCSI target or SCSI device rescans?

Each repetition is per drive, i.e. a junk of 5 messages in your case.


[4.929517] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


first occurrence after initial probing


[4.933893] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).
[4.946474] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


looks like we go through the code path more than once during initial probing


[   99.057592] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[  409.335119] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[  719.760106] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[ 1018.089562] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[ 1328.086120] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


...

but from then on, you only get it roughly once every 300 seconds, i.e. 5 
minutes


that's where I suspect user space as trigger, unless there is a kernel 
feature I'm not aware of doing such sdev rescans


preventing this would be a workaround

assuming the Linux check is correct, the proper fix might be that the 
device should present itself according to standards such that Linux 
silently uses READ CAPACITY(16) in the first place



static int sd_try_rc16_first(struct scsi_device *sdp)
{
if (sdp->host->max_cmd_len < 16)
return 0;


option


if (sdp->try_rc_10_first)
return 0;


option


if (sdp->scsi_level > SCSI_SPC_2)
return 1;
if (scsi_device_protection(sdp))
return 1;
return 0;


option


}


just picking one arbitrary option and not being entirely sure that's the 
code path but you mentioned USB to SATA bridge, it might be related to:



*** drivers/usb/storage/scsiglue.c:
slave_configure[239]   sdev->try_rc_10_first = 1;



/*
 * Many devices do not respond properly to READ_CAPACITY_16.
 * Tell the SCSI layer to try READ_CAPACITY_10 first.
 * However some USB 3.0 drive enclosures return capacity
 * modulo 2TB. Those must use READ_CAPACITY_16
 */
if (!(us->fflags & US_FL_NEEDS_CAP16))
sdev->try_rc_10_first = 1;


if that's the cause, maybe an entry in 
drivers/usb/storage/unusual_devs.h would help, but that's really just 
guessing as I'm not familiar with USB


--
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



[bug report] scsi: lpfc: Add WQ Full Logic for NVME Target

2018-03-07 Thread Dan Carpenter
Hello James Smart,

The patch 6e8e1c14c61e: "scsi: lpfc: Add WQ Full Logic for NVME
Target" from Jan 30, 2018, leads to the following static checker
warning:

drivers/scsi/lpfc/lpfc_nvmet.c:921 lpfc_nvmet_xmt_fcp_abort()
warn: inconsistent returns 'ctxp->ctxlock'.

drivers/scsi/lpfc/lpfc_nvmet.c
   889  atomic_inc(_nvmep->xmt_fcp_abort);
   890  
   891  spin_lock_irqsave(>ctxlock, flags);
   892  ctxp->state = LPFC_NVMET_STE_ABORT;
   893  
   894  /* Since iaab/iaar are NOT set, we need to check
   895   * if the firmware is in process of aborting IO
   896   */
   897  if (ctxp->flag & LPFC_NVMET_XBUSY) {
   898  spin_unlock_irqrestore(>ctxlock, flags);
   899  return;
   900  }
   901  ctxp->flag |= LPFC_NVMET_ABORT_OP;
   902  
   903  if (ctxp->flag & LPFC_NVMET_DEFER_WQFULL) {
   904  lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
   905   ctxp->oxid);
   906  wq = phba->sli4_hba.nvme_wq[ctxp->wqeq->hba_wqidx];
   907  lpfc_nvmet_wqfull_flush(phba, wq, ctxp);
   908  return;
^^^
Missing unlock before this return.

   909  }
   910  
   911  /* An state of LPFC_NVMET_STE_RCV means we have just received
   912   * the NVME command and have not started processing it.
   913   * (by issuing any IO WQEs on this exchange yet)
   914   */
   915  if (ctxp->state == LPFC_NVMET_STE_RCV)
   916  lpfc_nvmet_unsol_fcp_issue_abort(phba, ctxp, ctxp->sid,
   917   ctxp->oxid);
   918  else
   919  lpfc_nvmet_sol_fcp_issue_abort(phba, ctxp, ctxp->sid,
   920 ctxp->oxid);
   921  spin_unlock_irqrestore(>ctxlock, flags);
   922  }


regards,
dan carpenter


[PATCH 6/6] lib/scatterlist: Drop order argument from sgl_free_n_order

2018-03-07 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

We can derive the order from sg->length and so do not need to pass it in
explicitly. Rename the function to sgl_free_n.

Signed-off-by: Tvrtko Ursulin 
Cc: Bart Van Assche 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: Jens Axboe 
Cc: "Nicholas A. Bellinger" 
Cc: linux-scsi@vger.kernel.org
Cc: target-de...@vger.kernel.org

---
 drivers/target/target_core_transport.c |  2 +-
 include/linux/scatterlist.h|  5 ++---
 lib/scatterlist.c  | 16 ++--
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 4558f2e1fe1b..91e8f4047492 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2303,7 +2303,7 @@ static void target_complete_ok_work(struct work_struct 
*work)
 
 void target_free_sgl(struct scatterlist *sgl, int nents)
 {
-   sgl_free_n_order(sgl, nents, 0);
+   sgl_free_n(sgl, nents);
 }
 EXPORT_SYMBOL(target_free_sgl);
 
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 3ffc5f3bf181..3779d1fdd5c4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -280,8 +280,7 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, struct 
page **pages,
 struct scatterlist *sgl_alloc_order(unsigned long length, unsigned int order,
bool chainable, gfp_t gfp,
unsigned int *nent_p);
-void sgl_free_n_order(struct scatterlist *sgl, unsigned int nents,
- unsigned int order);
+void sgl_free_n(struct scatterlist *sgl, unsigned int nents);
 
 /**
  * sgl_alloc - allocate a scatterlist and its pages
@@ -303,7 +302,7 @@ sgl_alloc(unsigned long length, gfp_t gfp, unsigned int 
*nent_p)
  */
 static inline void sgl_free(struct scatterlist *sgl)
 {
-   sgl_free_n_order(sgl, UINT_MAX, 0);
+   sgl_free_n(sgl, UINT_MAX);
 }
 
 #endif /* CONFIG_SGL_ALLOC */
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c637849482d3..76111e91a038 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -493,7 +493,7 @@ struct scatterlist *sgl_alloc_order(unsigned long length, 
unsigned int order,
 {
unsigned int chunk_len = PAGE_SIZE << order;
struct scatterlist *sgl, *sg;
-   unsigned int nent, i;
+   unsigned int nent;
 
nent = round_up(length, chunk_len) >> (PAGE_SHIFT + order);
 
@@ -517,12 +517,11 @@ struct scatterlist *sgl_alloc_order(unsigned long length, 
unsigned int order,
 
sg_init_table(sgl, nent);
sg = sgl;
-   i = 0;
while (length) {
struct page *page = alloc_pages(gfp, order);
 
if (!page) {
-   sgl_free_n_order(sgl, i, order);
+   sgl_free(sgl);
return NULL;
}
 
@@ -530,7 +529,6 @@ struct scatterlist *sgl_alloc_order(unsigned long length, 
unsigned int order,
sg_set_page(sg, page, chunk_len, 0);
length -= chunk_len;
sg = sg_next(sg);
-   i++;
}
WARN_ONCE(length, "length = %ld\n", length);
return sgl;
@@ -538,10 +536,9 @@ struct scatterlist *sgl_alloc_order(unsigned long length, 
unsigned int order,
 EXPORT_SYMBOL(sgl_alloc_order);
 
 /**
- * sgl_free_n_order - free a scatterlist and its pages
+ * sgl_free_n - free a scatterlist and its pages
  * @sgl: Scatterlist with one or more elements
  * @nents: Maximum number of elements to free
- * @order: Second argument for __free_pages()
  *
  * Notes:
  * - If several scatterlists have been chained and each chain element is
@@ -550,8 +547,7 @@ EXPORT_SYMBOL(sgl_alloc_order);
  * - All pages in a chained scatterlist can be freed at once by setting @nents
  *   to a high number.
  */
-void sgl_free_n_order(struct scatterlist *sgl, unsigned int nents,
- unsigned int order)
+void sgl_free_n(struct scatterlist *sgl, unsigned int nents)
 {
struct scatterlist *sg;
struct page *page;
@@ -562,11 +558,11 @@ void sgl_free_n_order(struct scatterlist *sgl, unsigned 
int nents,
break;
page = sg_page(sg);
if (page)
-   __free_pages(page, order);
+   __free_pages(page, get_order(sg->length));
}
kfree(sgl);
 }
-EXPORT_SYMBOL(sgl_free_n_order);
+EXPORT_SYMBOL(sgl_free_n);
 
 #endif /* CONFIG_SGL_ALLOC */
 
-- 
2.14.1



[PATCH] scsi: iscsi_tcp: set BDI_CAP_STABLE_WRITES when data digest enabled

2018-03-07 Thread Jianchao Wang
iscsi tcp will first send out data, then calculate and send data
digest. If we don't have BDI_CAP_STABLE_WRITES, the page cache will
be written in spite of the on going writeback. Consequently, wrong
digest will be got and sent to target.

To fix this, set BDI_CAP_STABLE_WRITES when data digest is enabled
in iscsi_tcp .slave_configure callback.

Signed-off-by: Jianchao Wang 
---
 drivers/scsi/iscsi_tcp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 6198559..261c686 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -954,6 +955,12 @@ static int iscsi_sw_tcp_slave_alloc(struct scsi_device 
*sdev)
 
 static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
 {
+   struct iscsi_sw_tcp_host *tcp_sw_host = iscsi_host_priv(sdev->host);
+   struct iscsi_session *session = tcp_sw_host->session;
+   struct iscsi_conn *conn = session->leadconn;
+
+   if (conn->datadgst_en)
+   sdev->request_queue->backing_dev_info->capabilities |= 
BDI_CAP_STABLE_WRITES;
blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
blk_queue_dma_alignment(sdev->request_queue, 0);
return 0;
-- 
2.7.4



[PATCH v4 0/10] hisi_sas: support x6000 board and some misc changes

2018-03-07 Thread John Garry
This patchset primarily adds support for the Huawei x6000 board,
which includes hip07 chipset. Unfortunately, due to some board
layout differences with our development board, we need to set
a PHY-related register differently for optimal signal quality. As
such, a signal attenuation property is added to describe the
differences in the boards and allow the PHY register to be set
appropriately.

In addition to this above features, some misc changes are added for:
- PHY linkrate sysfs interface
- linkrate set function
- internal abort timer timeout increase
- add module device id tabe for v3 hw
- register init setting changes

Differences to v3:
- address comments from Hannes on patch #8:
  https://marc.info/?l=linux-scsi=152033924615703=2
- add patch for module Id table and some register
  setting for v3 HW.

Differences to v2:
- rename dt binding property name to "hisilicon,signal-attenuation"

Differences to v1:
- rename dt binding property name to include "hisi-" prefix

Xiang Chen (3):
  scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
  scsi: hisi_sas: Code cleanup and minor bug fixes
  scsi: hisi_sas: add v3 hw MODULE_DEVICE_TABLE()

Xiaofei Tan (7):
  dt-bindings: scsi: hisi_sas: add an property of signal attenuation
  scsi: hisi_sas: support the property of signal attenuation for v2 hw
  scsi: hisi_sas: fix the issue of link rate inconsistency
  scsi: hisi_sas: fix the issue of setting linkrate register
  scsi: hisi_sas: increase timer expire of internal abort task
  scsi: hisi_sas: fix return value of hisi_sas_task_prep()
  scsi: hisi_sas: modify some register config for hip08

 .../devicetree/bindings/scsi/hisilicon-sas.txt |  7 +++
 drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 34 +-
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 13 ++--
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 62 ++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 72 ++
 6 files changed, 109 insertions(+), 80 deletions(-)

-- 
1.9.1



[PATCH v4 02/10] scsi: hisi_sas: support the property of signal attenuation for v2 hw

2018-03-07 Thread John Garry
From: Xiaofei Tan 

The register SAS_PHY_CTRL is configured according to signal quality.
The signal quality is calculated by signal attenuation of hardware
physical link. It may be different for different PCB layout.

So, in order to give better support to new board, this patch add
support to reading the devicetree property, "hisilicon,signal-attenuation".
Of course, we still keep an default value in driver to adapt old
board.

Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 39 +-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 4ccb61e..42b3fd6 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -406,6 +406,17 @@ struct hisi_sas_err_record_v2 {
__le32 dma_rx_err_type;
 };
 
+struct signal_attenuation_s {
+   u32 de_emphasis;
+   u32 preshoot;
+   u32 boost;
+};
+
+struct sig_atten_lu_s {
+   const struct signal_attenuation_s *att;
+   u32 sas_phy_ctrl;
+};
+
 static const struct hisi_sas_hw_error one_bit_ecc_errors[] = {
{
.irq_msk = BIT(SAS_ECC_INTR_DQE_ECC_1B_OFF),
@@ -1130,9 +1141,16 @@ static void phys_try_accept_stp_links_v2_hw(struct 
hisi_hba *hisi_hba)
}
 }
 
+static const struct signal_attenuation_s x6000 = {9200, 0, 10476};
+static const struct sig_atten_lu_s sig_atten_lu[] = {
+   { , 0x3016a68 },
+};
+
 static void init_reg_v2_hw(struct hisi_hba *hisi_hba)
 {
struct device *dev = hisi_hba->dev;
+   u32 sas_phy_ctrl = 0x30b9908;
+   u32 signal[3];
int i;
 
/* Global registers init */
@@ -1176,9 +1194,28 @@ static void init_reg_v2_hw(struct hisi_hba *hisi_hba)
hisi_sas_write32(hisi_hba, AXI_AHB_CLK_CFG, 1);
hisi_sas_write32(hisi_hba, HYPER_STREAM_ID_EN_CFG, 1);
 
+   /* Get sas_phy_ctrl value to deal with TX FFE issue. */
+   if (!device_property_read_u32_array(dev, "hisilicon,signal-attenuation",
+   signal, ARRAY_SIZE(signal))) {
+   for (i = 0; i < ARRAY_SIZE(sig_atten_lu); i++) {
+   const struct sig_atten_lu_s *lookup = _atten_lu[i];
+   const struct signal_attenuation_s *att = lookup->att;
+
+   if ((signal[0] == att->de_emphasis) &&
+   (signal[1] == att->preshoot) &&
+   (signal[2] == att->boost)) {
+   sas_phy_ctrl = lookup->sas_phy_ctrl;
+   break;
+   }
+   }
+
+   if (i == ARRAY_SIZE(sig_atten_lu))
+   dev_warn(dev, "unknown signal attenuation values, using 
default PHY ctrl config\n");
+   }
+
for (i = 0; i < hisi_hba->n_phy; i++) {
hisi_sas_phy_write32(hisi_hba, i, PROG_PHY_LINK_RATE, 0x855);
-   hisi_sas_phy_write32(hisi_hba, i, SAS_PHY_CTRL, 0x30b9908);
+   hisi_sas_phy_write32(hisi_hba, i, SAS_PHY_CTRL, sas_phy_ctrl);
hisi_sas_phy_write32(hisi_hba, i, SL_TOUT_CFG, 0x7d7d7d7d);
hisi_sas_phy_write32(hisi_hba, i, SL_CONTROL, 0x0);
hisi_sas_phy_write32(hisi_hba, i, TXID_AUTO, 0x2);
-- 
1.9.1



[PATCH v4 06/10] scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req

2018-03-07 Thread John Garry
From: Xiang Chen 

The structure element hisi_sas_devices.running_req to count how
many commands are active is in effect only ever written in the
code, so remove it.

Signed-off-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas.h   | 1 -
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 9 -
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 ---
 3 files changed, 13 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index e7fd287..d1153e8 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -175,7 +175,6 @@ struct hisi_sas_device {
struct hisi_sas_dq  *dq;
struct list_headlist;
u64 attached_phy;
-   atomic64_t running_req;
enum sas_device_typedev_type;
int device_id;
int sata_idx;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 9ff8790..88ad8d4 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -200,8 +200,6 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, 
struct sas_task *task,
 
if (task) {
struct device *dev = hisi_hba->dev;
-   struct domain_device *device = task->dev;
-   struct hisi_sas_device *sas_dev = device->lldd_dev;
 
if (!task->lldd_task)
return;
@@ -213,9 +211,6 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, 
struct sas_task *task,
dma_unmap_sg(dev, task->scatter,
 task->num_scatter,
 task->data_dir);
-
-   if (sas_dev)
-   atomic64_dec(_dev->running_req);
}
 
if (slot->buf)
@@ -431,8 +426,6 @@ static int hisi_sas_task_prep(struct sas_task *task, struct 
hisi_sas_dq
spin_unlock_irqrestore(>task_state_lock, flags);
 
dq->slot_prep = slot;
-
-   atomic64_inc(_dev->running_req);
++(*pass);
 
return 0;
@@ -1517,8 +1510,6 @@ static int hisi_sas_query_task(struct sas_task *task)
 
dq->slot_prep = slot;
 
-   atomic64_inc(_dev->running_req);
-
/* send abort command to the chip */
hisi_hba->hw->start_delivery(dq);
spin_unlock_irqrestore(>lock, flags_dq);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 2eb8980..8dd0e6a6 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1407,9 +1407,6 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
}
 
 out:
-   if (sas_dev)
-   atomic64_dec(_dev->running_req);
-
hisi_sas_slot_task_free(hisi_hba, task, slot);
sts = ts->stat;
 
-- 
1.9.1



[PATCH v4 03/10] scsi: hisi_sas: fix the issue of link rate inconsistency

2018-03-07 Thread John Garry
From: Xiaofei Tan 

In sysfs, there are two files about minimum linkrate, and also
two files for maximum linkrate. Take maximum linkrate example,
maximum_linkrate_hw is read-only and indicated by the register
HARD_PHY_LINKRATE, and maximum_linkrate is read-write and
corresponding to the register PROG_PHY_LINK_RATE.

But in the function phy_up_v*_hw(), we get *_linkrate value from
HARD_PHY_LINKRATE. It is not right. This patch is to fix this issue.

Unreferenced PHY-interrupt enum is also removed for v3 hw.

Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  2 ++
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |  1 -
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |  8 +---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 13 +
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2d4dbed..9d16372 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -683,6 +683,8 @@ static void hisi_sas_phy_init(struct hisi_hba *hisi_hba, 
int phy_no)
 
phy->hisi_hba = hisi_hba;
phy->port = NULL;
+   phy->minimum_linkrate = SAS_LINK_RATE_1_5_GBPS;
+   phy->maximum_linkrate = hisi_hba->hw->phy_get_max_linkrate();
sas_phy->enabled = (phy_no < hisi_hba->n_phy) ? 1 : 0;
sas_phy->class = SAS;
sas_phy->iproto = SAS_PROTOCOL_ALL;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 679e76f..38bbda9 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -873,7 +873,6 @@ static void phy_set_linkrate_v1_hw(struct hisi_hba 
*hisi_hba, int phy_no,
sas_phy->phy->maximum_linkrate = max;
sas_phy->phy->minimum_linkrate = min;
 
-   min -= SAS_LINK_RATE_1_5_GBPS;
max -= SAS_LINK_RATE_1_5_GBPS;
 
for (i = 0; i <= max; i++)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 42b3fd6..67be346 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1603,7 +1603,6 @@ static void phy_set_linkrate_v2_hw(struct hisi_hba 
*hisi_hba, int phy_no,
sas_phy->phy->maximum_linkrate = max;
sas_phy->phy->minimum_linkrate = min;
 
-   min -= SAS_LINK_RATE_1_5_GBPS;
max -= SAS_LINK_RATE_1_5_GBPS;
 
for (i = 0; i <= max; i++)
@@ -2684,7 +2683,7 @@ static int prep_abort_v2_hw(struct hisi_hba *hisi_hba,
 static int phy_up_v2_hw(int phy_no, struct hisi_hba *hisi_hba)
 {
int i, res = IRQ_HANDLED;
-   u32 port_id, link_rate, hard_phy_linkrate;
+   u32 port_id, link_rate;
struct hisi_sas_phy *phy = _hba->phy[phy_no];
struct asd_sas_phy *sas_phy = >sas_phy;
struct device *dev = hisi_hba->dev;
@@ -2723,11 +2722,6 @@ static int phy_up_v2_hw(int phy_no, struct hisi_hba 
*hisi_hba)
}
 
sas_phy->linkrate = link_rate;
-   hard_phy_linkrate = hisi_sas_phy_read32(hisi_hba, phy_no,
-   HARD_PHY_LINKRATE);
-   phy->maximum_linkrate = hard_phy_linkrate & 0xf;
-   phy->minimum_linkrate = (hard_phy_linkrate >> 4) & 0xf;
-
sas_phy->oob_mode = SAS_OOB_MODE;
memcpy(sas_phy->attached_sas_addr, >sas_addr, SAS_ADDR_SIZE);
dev_info(dev, "phyup: phy%d link_rate=%d\n", phy_no, link_rate);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index a1f1868..1ee95ab 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -340,12 +340,6 @@ struct hisi_sas_err_record_v3 {
 #define HISI_SAS_COMMAND_ENTRIES_V3_HW 4096
 #define HISI_SAS_MSI_COUNT_V3_HW 32
 
-enum {
-   HISI_SAS_PHY_PHY_UPDOWN,
-   HISI_SAS_PHY_CHNL_INT,
-   HISI_SAS_PHY_INT_NR
-};
-
 #define DIR_NO_DATA 0
 #define DIR_TO_INI 1
 #define DIR_TO_DEVICE 2
@@ -1121,7 +1115,7 @@ static int prep_abort_v3_hw(struct hisi_hba *hisi_hba,
 static int phy_up_v3_hw(int phy_no, struct hisi_hba *hisi_hba)
 {
int i, res = 0;
-   u32 context, port_id, link_rate, hard_phy_linkrate;
+   u32 context, port_id, link_rate;
struct hisi_sas_phy *phy = _hba->phy[phy_no];
struct asd_sas_phy *sas_phy = >sas_phy;
struct device *dev = hisi_hba->dev;
@@ -1139,10 +1133,6 @@ static int phy_up_v3_hw(int phy_no, struct hisi_hba 
*hisi_hba)
goto end;
}
sas_phy->linkrate = link_rate;
-   hard_phy_linkrate = hisi_sas_phy_read32(hisi_hba, phy_no,
-   HARD_PHY_LINKRATE);
-   phy->maximum_linkrate = hard_phy_linkrate & 0xf;
-   phy->minimum_linkrate = (hard_phy_linkrate >> 4) & 0xf;
phy->phy_type &= ~(PORT_TYPE_SAS | PORT_TYPE_SATA);
 
/* Check for SATA 

[PATCH v4 09/10] scsi: hisi_sas: modify some register config for hip08

2018-03-07 Thread John Garry
From: Xiaofei Tan 

Do some modifications for register configuring for hip08.

In future, to reduce kernel churn with patches to modify
registers, any registers which may change between board
models (mostly PHY/SERDES related) should be set in ACPI
reset handler.

Signed-off-by: Xiaofei Tan 
Signed-off-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 4023fcb..5ce5ef2c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -172,6 +172,7 @@
 #define CHL_INT1_MSK   (PORT_BASE + 0x1c4)
 #define CHL_INT2_MSK   (PORT_BASE + 0x1c8)
 #define CHL_INT_COAL_EN(PORT_BASE + 0x1d0)
+#define SAS_RX_TRAIN_TIMER (PORT_BASE + 0x2a4)
 #define PHY_CTRL_RDY_MSK   (PORT_BASE + 0x2b0)
 #define PHYCTRL_NOT_RDY_MSK(PORT_BASE + 0x2b4)
 #define PHYCTRL_DWS_RESET_MSK  (PORT_BASE + 0x2b8)
@@ -184,6 +185,8 @@
 #define DMA_RX_STATUS  (PORT_BASE + 0x2e8)
 #define DMA_RX_STATUS_BUSY_OFF 0
 #define DMA_RX_STATUS_BUSY_MSK (0x1 << DMA_RX_STATUS_BUSY_OFF)
+
+#define COARSETUNE_TIME(PORT_BASE + 0x304)
 #define ERR_CNT_DWS_LOST   (PORT_BASE + 0x380)
 #define ERR_CNT_RESET_PROB (PORT_BASE + 0x384)
 #define ERR_CNT_INVLD_DW   (PORT_BASE + 0x390)
@@ -417,10 +420,10 @@ static void init_reg_v3_hw(struct hisi_hba *hisi_hba)
hisi_sas_write32(hisi_hba, OQ0_INT_SRC_MSK+0x4*i, 0);
 
hisi_sas_write32(hisi_hba, HYPER_STREAM_ID_EN_CFG, 1);
-   hisi_sas_write32(hisi_hba, AXI_MASTER_CFG_BASE, 0x3);
 
for (i = 0; i < hisi_hba->n_phy; i++) {
-   hisi_sas_phy_write32(hisi_hba, i, PROG_PHY_LINK_RATE, 0x801);
+   hisi_sas_phy_write32(hisi_hba, i, PROG_PHY_LINK_RATE, 0x855);
+   hisi_sas_phy_write32(hisi_hba, i, SAS_RX_TRAIN_TIMER, 0x13e80);
hisi_sas_phy_write32(hisi_hba, i, CHL_INT0, 0x);
hisi_sas_phy_write32(hisi_hba, i, CHL_INT1, 0x);
hisi_sas_phy_write32(hisi_hba, i, CHL_INT2, 0x);
@@ -432,17 +435,13 @@ static void init_reg_v3_hw(struct hisi_hba *hisi_hba)
hisi_sas_phy_write32(hisi_hba, i, PHYCTRL_DWS_RESET_MSK, 0x0);
hisi_sas_phy_write32(hisi_hba, i, PHYCTRL_PHY_ENA_MSK, 0x0);
hisi_sas_phy_write32(hisi_hba, i, SL_RX_BCAST_CHK_MSK, 0x0);
-   hisi_sas_phy_write32(hisi_hba, i, PHYCTRL_OOB_RESTART_MSK, 0x0);
-   hisi_sas_phy_write32(hisi_hba, i, PHY_CTRL, 0x199b4fa);
-   hisi_sas_phy_write32(hisi_hba, i, SAS_SSP_CON_TIMER_CFG,
-0xa03e8);
-   hisi_sas_phy_write32(hisi_hba, i, SAS_STP_CON_TIMER_CFG,
-0xa03e8);
-   hisi_sas_phy_write32(hisi_hba, i, STP_LINK_TIMER,
-0x7f7a120);
-   hisi_sas_phy_write32(hisi_hba, i, CON_CFG_DRIVER,
-0x2a0a80);
+   hisi_sas_phy_write32(hisi_hba, i, PHYCTRL_OOB_RESTART_MSK, 0x1);
+   hisi_sas_phy_write32(hisi_hba, i, STP_LINK_TIMER, 0x7f7a120);
+
+   /* used for 12G negotiate */
+   hisi_sas_phy_write32(hisi_hba, i, COARSETUNE_TIME, 0x1e);
}
+
for (i = 0; i < hisi_hba->queue_count; i++) {
/* Delivery queue */
hisi_sas_write32(hisi_hba,
-- 
1.9.1



[PATCH v4 05/10] scsi: hisi_sas: increase timer expire of internal abort task

2018-03-07 Thread John Garry
From: Xiaofei Tan 

The current 110ms expiry time is not long enough for the internal
abort task.

The reason is that the internal abort task could be blocked in HW
if the HW is retrying to set up link. The internal abort task will
be executed only when the retry process finished.

The maximum time is 5s for the retry of setting up link. So, the timer
expire should be more than 5s. This patch increases it from 110ms to 6s.

Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 9d16372..9ff8790 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -871,6 +871,7 @@ static void hisi_sas_tmf_timedout(struct timer_list *t)
 
 #define TASK_TIMEOUT 20
 #define TASK_RETRY 3
+#define INTERNAL_ABORT_TIMEOUT 6
 static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
   void *parameter, u32 para_len,
   struct hisi_sas_tmf_task *tmf)
@@ -1574,7 +1575,7 @@ static int hisi_sas_query_task(struct sas_task *task)
task->task_proto = device->tproto;
task->task_done = hisi_sas_task_done;
task->slow_task->timer.function = hisi_sas_tmf_timedout;
-   task->slow_task->timer.expires = jiffies + msecs_to_jiffies(110);
+   task->slow_task->timer.expires = jiffies + INTERNAL_ABORT_TIMEOUT*HZ;
add_timer(>slow_task->timer);
 
res = hisi_sas_internal_abort_task_exec(hisi_hba, sas_dev->device_id,
-- 
1.9.1



[PATCH v4 10/10] scsi: hisi_sas: add v3 hw MODULE_DEVICE_TABLE()

2018-03-07 Thread John Garry
From: Xiang Chen 

Export device table of v3 hw to userspace, or auto probe will fail
for v3 hw.

Also change the module alias to include "pci", instead of "platform".

Signed-off-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 5ce5ef2c..6f3e5ba 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2394,6 +2394,7 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)
{ PCI_VDEVICE(HUAWEI, 0xa230), hip08 },
{}
 };
+MODULE_DEVICE_TABLE(pci, sas_v3_pci_table);
 
 static const struct pci_error_handlers hisi_sas_err_handler = {
.error_detected = hisi_sas_error_detected_v3_hw,
@@ -2416,4 +2417,4 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("John Garry ");
 MODULE_DESCRIPTION("HISILICON SAS controller v3 hw driver based on pci 
device");
-MODULE_ALIAS("platform:" DRV_NAME);
+MODULE_ALIAS("pci:" DRV_NAME);
-- 
1.9.1



[PATCH v4 08/10] scsi: hisi_sas: Code cleanup and minor bug fixes

2018-03-07 Thread John Garry
From: Xiang Chen 

The patch does some code cleanup and fixes some small bugs:
- Correct return status of phy_up_v3_hw() and
  phy_bcast_v3_hw()
- Add static for function phy_get_max_linkrate_v3_hw()
- Change exception return status when no reset method
- Change magic value to ts->stat in slot_complete_vx_hw()
- Remove unnecessary check for dev_is_sata()
- Fix some issues of alignment and indents (Authored by
  Xiaofei Tan in another patch, but added here to be
  practical)

Signed-off-by: Xiaofei Tan 
Signed-off-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 14 +++---
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |  4 +++-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 10 ++
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 28 +---
 4 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index dff9723..49c1fa6 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -33,7 +33,7 @@ u8 hisi_sas_get_ata_protocol(struct host_to_dev_fis *fis, int 
direction)
case ATA_CMD_FPDMA_RECV:
case ATA_CMD_FPDMA_SEND:
case ATA_CMD_NCQ_NON_DATA:
-   return HISI_SAS_SATA_PROTOCOL_FPDMA;
+   return HISI_SAS_SATA_PROTOCOL_FPDMA;
 
case ATA_CMD_DOWNLOAD_MICRO:
case ATA_CMD_ID_ATA:
@@ -45,7 +45,7 @@ u8 hisi_sas_get_ata_protocol(struct host_to_dev_fis *fis, int 
direction)
case ATA_CMD_WRITE_LOG_EXT:
case ATA_CMD_PIO_WRITE:
case ATA_CMD_PIO_WRITE_EXT:
-   return HISI_SAS_SATA_PROTOCOL_PIO;
+   return HISI_SAS_SATA_PROTOCOL_PIO;
 
case ATA_CMD_DSM:
case ATA_CMD_DOWNLOAD_MICRO_DMA:
@@ -64,7 +64,7 @@ u8 hisi_sas_get_ata_protocol(struct host_to_dev_fis *fis, int 
direction)
case ATA_CMD_WRITE_LOG_DMA_EXT:
case ATA_CMD_WRITE_STREAM_DMA_EXT:
case ATA_CMD_ZAC_MGMT_IN:
-   return HISI_SAS_SATA_PROTOCOL_DMA;
+   return HISI_SAS_SATA_PROTOCOL_DMA;
 
case ATA_CMD_CHK_POWER:
case ATA_CMD_DEV_RESET:
@@ -77,21 +77,21 @@ u8 hisi_sas_get_ata_protocol(struct host_to_dev_fis *fis, 
int direction)
case ATA_CMD_STANDBY:
case ATA_CMD_STANDBYNOW1:
case ATA_CMD_ZAC_MGMT_OUT:
-   return HISI_SAS_SATA_PROTOCOL_NONDATA;
+   return HISI_SAS_SATA_PROTOCOL_NONDATA;
default:
{
if (fis->command == ATA_CMD_SET_MAX) {
switch (fis->features) {
case ATA_SET_MAX_PASSWD:
case ATA_SET_MAX_LOCK:
-   return HISI_SAS_SATA_PROTOCOL_PIO;
+   return HISI_SAS_SATA_PROTOCOL_PIO;
 
case ATA_SET_MAX_PASSWD_DMA:
case ATA_SET_MAX_UNLOCK_DMA:
-   return HISI_SAS_SATA_PROTOCOL_DMA;
+   return HISI_SAS_SATA_PROTOCOL_DMA;
 
default:
-   return HISI_SAS_SATA_PROTOCOL_NONDATA;
+   return HISI_SAS_SATA_PROTOCOL_NONDATA;
}
}
if (direction == DMA_NONE)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 8dd0e6a6..84a0ccc 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -651,8 +651,10 @@ static int reset_hw_v1_hw(struct hisi_hba *hisi_hba)
dev_err(dev, "De-reset failed\n");
return -EIO;
}
-   } else
+   } else {
dev_warn(dev, "no reset method\n");
+   return -EINVAL;
+   }
 
return 0;
 }
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index bd1a48a..f89fb9a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1095,8 +1095,10 @@ static int reset_hw_v2_hw(struct hisi_hba *hisi_hba)
dev_err(dev, "SAS de-reset fail.\n");
return -EIO;
}
-   } else
-   dev_warn(dev, "no reset method\n");
+   } else {
+   dev_err(dev, "no reset method\n");
+   return -EINVAL;
+   }
 
return 0;
 }
@@ -2408,7 +2410,7 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
spin_lock_irqsave(_hba->lock, flags);
hisi_sas_slot_task_free(hisi_hba, task, slot);
spin_unlock_irqrestore(_hba->lock, flags);
-   return -1;
+   return ts->stat;
}
 
if (unlikely(!sas_dev)) {
@@ -2667,7 +2669,7 @@ static int prep_abort_v2_hw(struct hisi_hba *hisi_hba,
/* 

[PATCH v4 01/10] dt-bindings: scsi: hisi_sas: add an property of signal attenuation

2018-03-07 Thread John Garry
From: Xiaofei Tan 

For some new boards with hip07 chipset we are required to
set PHY config registers differently. The hw property which
determines how to set these registers is in the PHY signal
attenuation readings.

This patch add an devicetree property, "hisilicon,signal-attenuation",
which is used to describe the signal attenuation of an board.

Cc: Rob Herring 
Cc: Mark Rutland 
Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 Documentation/devicetree/bindings/scsi/hisilicon-sas.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/scsi/hisilicon-sas.txt 
b/Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
index df3bef7..8c6659e 100644
--- a/Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
+++ b/Documentation/devicetree/bindings/scsi/hisilicon-sas.txt
@@ -53,6 +53,13 @@ Main node required properties:
 Optional main node properties:
  - hip06-sas-v2-quirk-amt : when set, indicates that the v2 controller has the
"am-max-transmissions" limitation.
+ - hisilicon,signal-attenuation : array of 3 32-bit values, containing 
de-emphasis,
+   preshoot, and boost attenuation readings for the board. They
+   are used to describe the signal attenuation of the board. These
+   values' range is 7600 to 12400, and used to represent -24dB to
+   24dB.
+   The formula is "y = (x-1)/1". For example, 10478
+   means 4.78dB.
 
 Example:
sas0: sas@c100 {
-- 
1.9.1



[PATCH v4 04/10] scsi: hisi_sas: fix the issue of setting linkrate register

2018-03-07 Thread John Garry
From: Xiaofei Tan 

It is not right to set the register PROG_PHY_LINK_RATE while PHY
is still enabled. So if we want to change PHY linkrate, we need to
disable PHY before setting the register PROG_PHY_LINK_RATE, and then
start-up PHY. This patch is to fix this issue.

Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 5 +++--
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 5 +++--
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 +++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 38bbda9..2eb8980 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -881,10 +881,11 @@ static void phy_set_linkrate_v1_hw(struct hisi_hba 
*hisi_hba, int phy_no,
prog_phy_link_rate &= ~0xff;
prog_phy_link_rate |= rate_mask;
 
+   disable_phy_v1_hw(hisi_hba, phy_no);
+   msleep(100);
hisi_sas_phy_write32(hisi_hba, phy_no, PROG_PHY_LINK_RATE,
prog_phy_link_rate);
-
-   phy_hard_reset_v1_hw(hisi_hba, phy_no);
+   start_phy_v1_hw(hisi_hba, phy_no);
 }
 
 static int get_wideport_bitmap_v1_hw(struct hisi_hba *hisi_hba, int port_id)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 67be346..bd1a48a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1611,10 +1611,11 @@ static void phy_set_linkrate_v2_hw(struct hisi_hba 
*hisi_hba, int phy_no,
prog_phy_link_rate &= ~0xff;
prog_phy_link_rate |= rate_mask;
 
+   disable_phy_v2_hw(hisi_hba, phy_no);
+   msleep(100);
hisi_sas_phy_write32(hisi_hba, phy_no, PROG_PHY_LINK_RATE,
prog_phy_link_rate);
-
-   phy_hard_reset_v2_hw(hisi_hba, phy_no);
+   start_phy_v2_hw(hisi_hba, phy_no);
 }
 
 static int get_wideport_bitmap_v2_hw(struct hisi_hba *hisi_hba, int port_id)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 1ee95ab..8da9de7 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -1862,10 +1862,11 @@ static void phy_set_linkrate_v3_hw(struct hisi_hba 
*hisi_hba, int phy_no,
prog_phy_link_rate &= ~0xff;
prog_phy_link_rate |= rate_mask;
 
+   disable_phy_v3_hw(hisi_hba, phy_no);
+   msleep(100);
hisi_sas_phy_write32(hisi_hba, phy_no, PROG_PHY_LINK_RATE,
prog_phy_link_rate);
-
-   phy_hard_reset_v3_hw(hisi_hba, phy_no);
+   start_phy_v3_hw(hisi_hba, phy_no);
 }
 
 static void interrupt_disable_v3_hw(struct hisi_hba *hisi_hba)
-- 
1.9.1



[PATCH v4 07/10] scsi: hisi_sas: fix return value of hisi_sas_task_prep()

2018-03-07 Thread John Garry
From: Xiaofei Tan 

It is an implicit regulation that error code that function returned
should be negative. But hisi_sas_task_prep() doesn't follow this.
This may cause problems in the upper layer code.

For example, in sas_expander.c of libsas, smp_execute_task_sg() may
return the number of bytes of underrun. It will be conflicted with
the scenaio lldd_execute_task() return an positive error code.

This patch change the return value from SAS_PHY_DOWN to -ECOMM in
hisi_sas_task_prep().

Signed-off-by: Xiaofei Tan 
Signed-off-by: John Garry 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 88ad8d4..dff9723 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -316,7 +316,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct 
hisi_sas_dq
 */
if (device->dev_type != SAS_SATA_DEV)
task->task_done(task);
-   return SAS_PHY_DOWN;
+   return -ECOMM;
}
 
if (DEV_IS_GONE(sas_dev)) {
@@ -327,7 +327,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct 
hisi_sas_dq
dev_info(dev, "task prep: device %016llx not ready\n",
 SAS_ADDR(device->sas_addr));
 
-   return SAS_PHY_DOWN;
+   return -ECOMM;
}
 
port = to_hisi_sas_port(sas_port);
@@ -337,7 +337,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct 
hisi_sas_dq
 "SATA/STP" : "SAS",
 device->port->id);
 
-   return SAS_PHY_DOWN;
+   return -ECOMM;
}
 
if (!sas_protocol_ata(task->task_proto)) {
-- 
1.9.1



[PATCH] scsi: scsi_transport_iscsi: use put_device() instead of kfree()

2018-03-07 Thread Arvind Yadav
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized.

Signed-off-by: Arvind Yadav 
---
 drivers/scsi/scsi_transport_iscsi.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index f4b52b4..aacb7ab 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -221,8 +221,10 @@ struct iscsi_endpoint *
ep->dev.class = _endpoint_class;
dev_set_name(>dev, "ep-%llu", (unsigned long long) id);
err = device_register(>dev);
-if (err)
-goto free_ep;
+   if (err) {
+   put_device(>dev);
+   return NULL;
+   }
 
err = sysfs_create_group(>dev.kobj, _endpoint_group);
if (err)
@@ -235,10 +237,6 @@ struct iscsi_endpoint *
 unregister_dev:
device_unregister(>dev);
return NULL;
-
-free_ep:
-   kfree(ep);
-   return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_create_endpoint);
 
@@ -783,7 +781,7 @@ struct iscsi_iface *
 
 free_iface:
put_device(iface->dev.parent);
-   kfree(iface);
+   put_device(>dev);
return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_create_iface);
@@ -1260,7 +1258,7 @@ struct iscsi_bus_flash_session *
return fnode_sess;
 
 free_fnode_sess:
-   kfree(fnode_sess);
+   put_device(_sess->dev);
return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_create_flashnode_sess);
@@ -1308,7 +1306,7 @@ struct iscsi_bus_flash_conn *
return fnode_conn;
 
 free_fnode_conn:
-   kfree(fnode_conn);
+   put_device(_conn->dev);
return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_create_flashnode_conn);
@@ -2268,6 +2266,8 @@ struct iscsi_cls_conn *
 
 release_parent_ref:
put_device(>dev);
+   put_device(>dev);
+   conn = NULL;
 free_conn:
kfree(conn);
return NULL;
@@ -4420,8 +4420,10 @@ struct scsi_transport_template *
priv->dev.class = _transport_class;
dev_set_name(>dev, "%s", tt->name);
err = device_register(>dev);
-   if (err)
-   goto free_priv;
+   if (err) {
+   put_device(>dev);
+   return NULL;
+   }
 
err = sysfs_create_group(>dev.kobj, _transport_group);
if (err)
@@ -4456,9 +4458,6 @@ struct scsi_transport_template *
 unregister_dev:
device_unregister(>dev);
return NULL;
-free_priv:
-   kfree(priv);
-   return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_register_transport);
 
-- 
1.9.1



[PATCH] target: tcm_loop: use put_device() if device_register fail

2018-03-07 Thread Arvind Yadav
if device_register() returned an error! Always use put_device()
to give up the reference initialized.

Signed-off-by: Arvind Yadav 
---
 drivers/target/loopback/tcm_loop.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/target/loopback/tcm_loop.c 
b/drivers/target/loopback/tcm_loop.c
index 9cd4ffe..5dffc55 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -412,6 +412,7 @@ static int tcm_loop_setup_hba_bus(struct tcm_loop_hba 
*tl_hba, int tcm_loop_host
ret = device_register(_hba->dev);
if (ret) {
pr_err("device_register() failed for tl_hba->dev: %d\n", ret);
+   put_device(_hba->dev);
return -ENODEV;
}
 
-- 
1.9.1



Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Menion
Hello Martin
Thanks for your answer.
By flooded I mean that it continously fill the dmesg log with no
interruption, check attached a log that I have just taken from my
server
Some more details on my setup. I have these 5 HDD, WD RED 8TB in an
Orico 5 bay enclosure, running JMS567 USBtoSATA bridge and an internal
SATA multiplexer
This is connected to the USB 3.0 host port of my server, it is an Intel Atom
So in total the array is 5x8TB and it is configured in BTRFS RAID5 mode.
I have already reported this issue to the linux-btrfs mailing list, I
got a feedback that the filesystem itself has nothing to do with this
capacity check, and I should have reported this here.
Bye


2018-03-07 3:45 GMT+01:00 Martin K. Petersen :
>
> Menion,
>
>> Operating big capacity HDD such 8TB with complex filesystems like
>> BTRFS in RAID mode endup in dmesg get flooded by this log, due too
>> many capacity checks (opaque to the filesystem itself)
>
> What's your definition of flooded? How many do you see?
>
> Also, what kind of controller are these disks attached to? The reason
> you see these messages is that to the kernel it looks like a legacy disk
> device that predates capacities in the TB range. The warnings are logged
> because we're surprised to be going down this path based on what the
> device has previously told us.
>
> --
> Martin K. Petersen  Oracle Linux Engineering

 * Documentation:  https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support:https://ubuntu.com/advantage

8 packages can be updated.
0 updates are security updates.


Last login: Mon Mar  5 15:32:57 2018 from 10.8.0.10
menion@Menionubuntu:~$ dmesg
[0.00] Linux version 4.15.5-041505-generic (kernel@gloin) (gcc version 
7.2.0 (Ubuntu 7.2.0-8ubuntu3.2)) #201802221031 SMP Thu Feb 22 15:32:28 UTC 2018
[0.00] Command line: BOOT_IMAGE=/@/boot/vmlinuz-4.15.5-041505-generic 
root=UUID=6db4baf7-fda8-41ac-a6ad-1ca7b083430f ro rootflags=subvol=@
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0008efff] usable
[0.00] BIOS-e820: [mem 0x0008f000-0x0008] ACPI NVS
[0.00] BIOS-e820: [mem 0x0009-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1fff] usable
[0.00] BIOS-e820: [mem 0x2000-0x201f] reserved
[0.00] BIOS-e820: [mem 0x2020-0x7b631fff] usable
[0.00] BIOS-e820: [mem 0x7b632000-0x7b661fff] reserved
[0.00] BIOS-e820: [mem 0x7b662000-0x7b685fff] usable
[0.00] BIOS-e820: [mem 0x7b686000-0x7b76bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7b76c000-0x7ba20fff] reserved
[0.00] BIOS-e820: [mem 0x7ba21000-0x7ba71fff] type 20
[0.00] BIOS-e820: [mem 0x7ba72000-0x7ba76fff] usable
[0.00] BIOS-e820: [mem 0x7ba77000-0x7ba77fff] reserved
[0.00] BIOS-e820: [mem 0x7ba78000-0x7ba7afff] usable
[0.00] BIOS-e820: [mem 0x7ba7b000-0x7ba7bfff] reserved
[0.00] BIOS-e820: [mem 0x7ba7c000-0x7bff] usable
[0.00] BIOS-e820: [mem 0xe000-0xe3ff] reserved
[0.00] BIOS-e820: [mem 0xfea0-0xfeaf] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed01000-0xfed01fff] reserved
[0.00] BIOS-e820: [mem 0xfed03000-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed06000-0xfed06fff] reserved
[0.00] BIOS-e820: [mem 0xfed08000-0xfed09fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1cfff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfedb] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xffc0-0x] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ESRT=0x7b66  ACPI=0x7b6ca000  ACPI 2.0=0x7b6ca000  
SMBIOS=0x7ba1f198
[0.00] random: fast init done
[0.00] SMBIOS 2.8 present.
[0.00] DMI: AZW Z83 II/Cherry Trail CR, BIOS YB1007 08/17/2017
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x7c000 max_arch_pfn = 

Re: [PATCH] scsi: ipr: Use dma_pool_zalloc()

2018-03-07 Thread Souptick Joarder
On Wed, Mar 7, 2018 at 8:45 AM, Martin K. Petersen
 wrote:
>
> Brian,
>
>> Thanks!
>>
>> Acked-by: Brian King 
>
> Not sure where this patch was sent but it's not in my mailbox, nor the
> list archives.
>
> Souptick: Please resubmit to linux-scsi with Brian's Acked-by.

Ok, I will resubmit.

>
> --
> Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 2/2] m68k/amiga - Zorro ESP: new zorro_esp.c

2018-03-07 Thread Geert Uytterhoeven
Hi Michael,

On Wed, Mar 7, 2018 at 8:55 AM, Michael Schmitz  wrote:
> OK, in that case I'll need to work out something similar to the test for
> optional SCSI function on the Blizzard 1230/1260 to find out what board
> I have when dealing with the duplicate Fastlane/Blizzard1230II ID.
>
> Is the board base address as returned by zorro_resource_start() reliable
> to distinguish between Zorro II and Zorro III boards?

The board base address is assigned by AmigaOS based on the values in the
Expansion ROM (mainly ExpansionRom.er_Type) on the board.
More specifically, AmigaOS creates struct ConfigDev from struct ExpansionRom.
So yes, it must be reliable.

> Am 06.03.2018 um 20:43 schrieb Geert Uytterhoeven:
>> On Tue, Mar 6, 2018 at 2:11 AM, Michael Schmitz  wrote:
>>> Index 1 should have been ZORRO_PROD_PHASE5_CYBERSTORM_MK_II, I've
>>> corrected that in the meantime.
>>>
>>> Fastlane / Blizzard 1230_II distinction is something I an not quite
>>> sure about - does the probe function get called twice if the device
>>> table contains the same ID twice but with different driver_data
>>> contents?
>>
>> No, the probe function gets called on the first match only.
>> Cfr. drivers/zorro/zorro-driver.c:zorro_device_probe().

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds