[iscsiadm] iscsiadm creates multiple same sessions when run with --login option in parallel.

2017-09-28 Thread Tangchen (UVP)
Hi guys,

If we run iscsiadm -m node --login command through the same IP address 4 times, 
only one session will be created.
But if we run them in parallel, then 4 same sessions could be created.
( Here, xxx.xxx.xxx.xxx is the IP address to the IPSAN. I'm using the same IP 
in these 4 commands. )

# iscsiadm -m node -p xxx.xxx.xxx.xxx  --login &
# iscsiadm -m node -p xxx.xxx.xxx.xxx  --login &
# iscsiadm -m node -p xxx.xxx.xxx.xxx  --login &
# iscsiadm -m node -p xxx.xxx.xxx.xxx  --login &
Logging in to [iface: default, target: iqn. xxx.xxx.xxx.xxx, portal: 
xxx.xxx.xxx.xxx] (multiple)
Logging in to [iface: default, target: iqn. xxx.xxx.xxx.xxx, portal: 
xxx.xxx.xxx.xxx] (multiple)
Logging in to [iface: default, target: iqn. xxx.xxx.xxx.xxx, portal: 
xxx.xxx.xxx.xxx] (multiple)
Logging in to [iface: default, target: iqn. xxx.xxx.xxx.xxx, portal: 
xxx.xxx.xxx.xxx] (multiple)
Login to [iface: default, target: xxx.xxx.xxx.xxx, portal: xxx.xxx.xxx.xxx] 
successful.
Login to [iface: default, target: xxx.xxx.xxx.xxx, portal: xxx.xxx.xxx.xxx] 
successful.
Login to [iface: default, target: xxx.xxx.xxx.xxx, portal: xxx.xxx.xxx.xxx] 
successful.
Login to [iface: default, target: xxx.xxx.xxx.xxx, portal: xxx.xxx.xxx.xxx] 
successful.

# iscsiadm -m session
tcp: [1] xxx.xxx.xxx.xxx (non-flash)
tcp: [2] xxx.xxx.xxx.xxx (non-flash)
tcp: [3] xxx.xxx.xxx.xxx (non-flash)
tcp: [4] xxx.xxx.xxx.xxx (non-flash)

If we check the net connection in /proc/net/nf_conntrack, they are 4 TCP 
connections with different src ports.
And if we run logout command only once, all the 4 sessions will be destroyed.

Unfortunately, service like multipathd cannot tell the difference between them. 
If we have 4 same sessions, 
4 paths will be created connecting to the dm device. But actually they are the 
same paths.

Referring to the code, iscsiadm command does prevent creating same session by 
checking /sys/class/iscsi_session/ dir.
But no multi-thread protection in there. 

Any idea how to solve this problem ?

Thanks.


RE: 答复: [iscsi] Deadlock occurred when network is in error

2017-08-15 Thread Tangchen (UVP)
> On Tue, 2017-08-15 at 02:16 +0000, Tangchen (UVP) wrote:
> > But I'm not using mq, and I run into these two problems in a non-mq system.
> > The patch you pointed out is fix for mq, so I don't think it can resolve 
> > this
> problem.
> >
> > IIUC, mq is for SSD ?  I'm not using ssd, so mq is disabled.
> 
> Hello Tangchen,
> 
> Please post replies below the original e-mail instead of above - that is the 
> reply
> style used on all Linux-related mailing lists I know of. From
> https://en.wikipedia.org/wiki/Posting_style:
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?

Hi Bart,

Thanks for the reply. Will post the reply in e-mail. :)

> 
> Regarding your question: sorry but I quoted the wrong commit in my previous
> e-mail. The commit I should have referred to is 255ee9320e5d ("scsi: Make
> __scsi_remove_device go straight from BLOCKED to DEL"). That patch not only
> affects scsi-mq but also the single-queue code in the SCSI core.

OK, I'll try this one. Thx.

> 
> blk-mq/scsi-mq was introduced for SSDs but is not only intended for SSDs.
> The plan is to remove the blk-sq/scsi-sq code once the blk-mq/scsi-mq code
> works at least as fast as the single queue code for all supported devices.
> That includes hard disks.

OK, thanks for tell me this.

> 
> Bart.


答复: [iscsi] Deadlock occurred when network is in error

2017-08-14 Thread Tangchen (UVP)
Hi, Bart,

Thank you very much for the quick response. 

But I'm not using mq, and I run into these two problems in a non-mq system.
The patch you pointed out is fix for mq, so I don't think it can resolve this 
problem.

IIUC, mq is for SSD ?  I'm not using ssd, so mq is disabled.


On Mon, 2017-08-14 at 11:23 +, Tangchen (UVP) wrote:
> Problem 2:
> 
> ***
> [What it looks like]
> ***
> When remove a scsi device, and the network error happens, __blk_drain_queue() 
> could hang forever.
> 
> # cat /proc/19160/stack
> [] msleep+0x1d/0x30
> [] __blk_drain_queue+0xe4/0x160 [] 
> blk_cleanup_queue+0x106/0x2e0 [] 
> __scsi_remove_device+0x52/0xc0 [scsi_mod] [] 
> scsi_remove_device+0x2b/0x40 [scsi_mod] [] 
> sdev_store_delete_callback+0x10/0x20 [scsi_mod] [] 
> sysfs_schedule_callback_work+0x15/0x80
> [] process_one_work+0x169/0x340 [] 
> worker_thread+0x183/0x490 [] kthread+0x96/0xa0 
> [] kernel_thread_helper+0x4/0x10 
> [] 0x
> 
> The request queue of this device was stopped. So the following check will be 
> true forever:
> __blk_run_queue()
> {
> if (unlikely(blk_queue_stopped(q)))
> return;
> 
> __blk_run_queue_uncond(q);
> }
> 
> So __blk_run_queue_uncond() will never be called, and the process hang.
> 
> [ ... ]
>
> 
> [How to reproduce]
> 
> Unfortunately I cannot reproduce it in the latest kernel. 
> The script below will help to reproduce, but not very often.
> 
> # create network error
> tc qdisc add dev eth1 root netem loss 60%
> 
> # restart iscsid and rescan scsi bus again and again while [ 1 ] do 
> systemctl restart iscsid
> rescan-scsi-bus
> (http://manpages.ubuntu.com/manpages/trusty/man8/rescan-scsi-bus.8.html)
> done

This should have been fixed by commit 36e3cf273977 ("scsi: Avoid that SCSI 
queues get stuck"). The first mainline kernel that includes this commit is 
kernel v4.11.

> void __blk_run_queue(struct request_queue *q) {
> -   if (unlikely(blk_queue_stopped(q)))
> +   if (unlikely(blk_queue_stopped(q)) && 
> + unlikely(!blk_queue_dying(q)))
> return;
> 
> __blk_run_queue_uncond(q);

Are you aware that the single queue block layer is on its way out and will be 
removed sooner or later? Please focus your testing on scsi-mq. 

Regarding the above patch: it is wrong because it will cause lockups during 
path removal for other block drivers. Please drop this patch.

Bart.


[iscsi] Deadlock occurred when network is in error

2017-08-14 Thread Tangchen (UVP)
Hi,

I found two hangup problems between iscsid service and iscsi module. And I can 
reproduce one
of them in the latest kernel always. So I think the problems really exist. 

It really took me a long time to find out why due to my lack of knowledge of 
iscsi. But I cannot
find a good way to solve them both.

Please do help to take a look at them. Thx.

=
Problem 1:

***
[What it looks like]
***
First, we connect to 10 remote LUNs with iscsid service with at least two 
dirrerent sessions. 
When network error occurs, the session could be in error. If we do login and 
logout, iscsid
service could run into D state.

My colleague has posted an email to report this problem before. And he posted a 
long call trace.
But barely gain any feedback.
(https://lkml.org/lkml/2017/6/19/330)


**
[Why it happens]
**
In the latest kernel, asynchronous part of sd_probe() was executed
in scsi_sd_probe_domain, and sd_remove() would wait until all the
works in scsi_sd_probe_domain finished. When we use iscsi based
remote storage, and the network is broken, the following deadlock
could happen.

1. An iscsi session login is in progress, and calls sd_probe() to
   probe a remote lun. The synchronous part has finished, and the
   asynchronous part is scheduled in scsi_sd_probe_domain, and will
   submit io to execute scsi cmd to obtain device info. When the
   network is broken, the session will go into ISCSI_SESSION_FAILED
   state, and the io will retry until the session becomes
   ISCSI_SESSION_FREE. As a result, the work in scsi_sd_probe_domain
   hangs.

2. On the other hand, iscsi kernel module detects network ping
   timeout, and triggers ISCSI_KEVENT_CONN_ERROR event. iscsid in
   user space will handle this event by triggering
   ISCSI_UEVENT_DESTROY_SESSION event. Destroy session process is
   synchronous, and when it calls sd_remove() to remove the lun,
   it waits until all the works in scsi_sd_probe_domain finish. As
   a result, it hangs, and iscsid in user space goes into D state
   which is not killable, and not able to handle all the other
   events.



[How to reproduce]

With the script below, I can reproduce it in the latest kernel always.

# create network errors
tc qdisc add dev eth1 root netem loss 60%

while [1]
do
iscsiadm -m node -T xx -login
sleep 5
iscsiadm -m node -T xx -logout &
iscsiadm -m node -T yy -login &
done

xx and yy are two different target names.

Connect to about 10 remote LUNs, and run the script for about half an hour will 
reproduce the problem.


***
[How I avoid it for now]
***
To avoid this problem, I simply remove scsi_sd_probe_domain, and call 
sd_probe_async() synchronously in sd_probe().
So sd_remove() doesn't need to wait for the domain again.

@@ -2986,7 +2986,40 @@ static int sd_probe(struct device *dev)
get_device(>dev); /* prevent release before async_schedule */
-   async_schedule_domain(sd_probe_async, sdkp, _sd_probe_domain);
+   sd_probe_async((void *)sdkp, 0);

I know this is not a good way, so would you please give some advice about it ?



=
Problem 2:

***
[What it looks like]
***
When remove a scsi device, and the network error happens, __blk_drain_queue() 
could hang forever.

# cat /proc/19160/stack 
[] msleep+0x1d/0x30
[] __blk_drain_queue+0xe4/0x160
[] blk_cleanup_queue+0x106/0x2e0
[] __scsi_remove_device+0x52/0xc0 [scsi_mod]
[] scsi_remove_device+0x2b/0x40 [scsi_mod]
[] sdev_store_delete_callback+0x10/0x20 [scsi_mod]
[] sysfs_schedule_callback_work+0x15/0x80
[] process_one_work+0x169/0x340
[] worker_thread+0x183/0x490
[] kthread+0x96/0xa0
[] kernel_thread_helper+0x4/0x10
[] 0x

The request queue of this device was stopped. So the following check will be 
true forever:
__blk_run_queue()
{
if (unlikely(blk_queue_stopped(q)))
return;

__blk_run_queue_uncond(q);
}

So __blk_run_queue_uncond() will never be called, and the process hang.


**
[Why it happens]
**
When the network error happens, iscsi kernel module detected the ping timeout 
and 
tried to recover the session. Here, the queue was stopped, or you can also say 
session was blocked.

iscsi_start_session_recovery(session, conn, flag);
|-> iscsi_block_session(session->cls_session);
   |-> blk_stop_queue(q)

The session should be unblocked if the session is recovered or the recovery 
times out.
But it was not unblocked properly because scsi_remove_device() deleted the the 
device 
first, and then called __blk_drain_queue(). 

__scsi_remove_device()
|-> device_del(dev)
|-> blk_cleanup_queue()
  |-> scsi_request_fn()
|-> __blk_drain_queue()

At this time, the device was not on the children list of the parent device. So 
when 
__iscsi_unblock_session() tried to unblock the parent device and its children,