Re: BUG in slab_free after iSCSI login timeout
On Sun, 12 Aug 2018 03:51:40 +, Vincent Pelletier wrote: > This said, I was surprised by "free" output listing only a bit > above 3GB of ram total: > $ free -m > totalusedfree shared buff/cache > available > Mem: 3310 2502867 5 192 > 2847 > Swap: 5015 05015 > Would it be an effect of KASAN ? > I also found the following line in dmesg: > [0.00] Memory: 3099784K/4088348K available (14348K kernel code, > 4532K rwdata, 5400K rodata, 1840K init, 9112K bss, 988564K reserved, 0K > cma-reserved) > Checking pre-KASAN boots it was: > [0.00] Memory: 3657884K/4088348K available (10252K kernel code, > 1210K rwdata, 3216K rodata, 1548K init, 656K bss, 430464K reserved, 0K > cma-reserved) Answering my own question after a bit of RTFM: KASAN uses 1 byte to track 8 bytes, so monitoring 4GB takes 512MB, so it indeed explains the vast majority of the difference in reserved memory. -- Vincent Pelletier
Re: BUG in slab_free after iSCSI login timeout
On Sun, 12 Aug 2018 02:55:31 +, Vincent Pelletier wrote: > Aug 12 04:44:53 boke kernel: [ 64.737069] BUG: KASAN: use-after-free in > iscsi_target_login_sess_out.cold.11+0x58/0x123 [iscsi_target_mod] > Aug 12 04:44:53 boke kernel: [ 64.771148] BUG: KASAN: double-free or > invalid-free in iscsi_target_login_sess_out.cold.11+0x103/0x123 > [iscsi_target_mod] If I'm reading the code correctly, the double-free would be iscsi_login_init_conn and iscsi_target_login_sess_out both calling kfree(conn->conn_ops), with the latter called by __iscsi_target_login_thread precisely when the former fails (returns NULL after freeing). I'm not spotting the use-after-free so far, and do not yet understand why iscsi_login_init_conn would fail: - allocation-related failures allocate a fixed amount of ram, the target machine has 4GB and very few userland processes This said, I was surprised by "free" output listing only a bit above 3GB of ram total: $ free -m totalusedfree shared buff/cache available Mem: 3310 2502867 5 192 2847 Swap: 5015 05015 Would it be an effect of KASAN ? I also found the following line in dmesg: [0.00] Memory: 3099784K/4088348K available (14348K kernel code, 4532K rwdata, 5400K rodata, 1840K init, 9112K bss, 988564K reserved, 0K cma-reserved) Checking pre-KASAN boots it was: [0.00] Memory: 3657884K/4088348K available (10252K kernel code, 1210K rwdata, 3216K rodata, 1548K init, 656K bss, 430464K reserved, 0K cma-reserved) - $ grep CONFIG_CPUMASK_OFFSTACK .config $ so zalloc_cpumask_var should have no way to fail. Regards, -- Vincent Pelletier
Re: BUG in slab_free after iSCSI login timeout
On Sat, 11 Aug 2018 22:50:12 +, Bart Van Assche wrote: > On Sat, 2018-08-11 at 09:36 +, Vincent Pelletier wrote: > > What can I try to help debug this further ? > > Can you try to reproduce this with KASAN enabled in the kernel config? Here is the syslog with KASAN enabled: Aug 12 04:44:53 boke kernel: [ 64.736033] iSCSI Login timeout on Network Portal [::]:3260 Aug 12 04:44:53 boke kernel: [ 64.736449] iSCSI Login negotiation failed. Aug 12 04:44:53 boke kernel: [ 64.736653] == Aug 12 04:44:53 boke kernel: [ 64.737069] BUG: KASAN: use-after-free in iscsi_target_login_sess_out.cold.11+0x58/0x123 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.737515] Read of size 8 at addr 880113ca6bc8 by task iscsi_np/992 Aug 12 04:44:53 boke kernel: [ 64.737814] Aug 12 04:44:53 boke kernel: [ 64.737914] CPU: 0 PID: 992 Comm: iscsi_np Tainted: G O 4.17.8kasan #1 Aug 12 04:44:53 boke kernel: [ 64.737920] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 05/19/2014 Aug 12 04:44:53 boke kernel: [ 64.737924] Call Trace: Aug 12 04:44:53 boke kernel: [ 64.737945] dump_stack+0x71/0xac Aug 12 04:44:53 boke kernel: [ 64.737961] print_address_description+0x65/0x22e Aug 12 04:44:53 boke kernel: [ 64.738054] ? iscsi_target_login_sess_out.cold.11+0x58/0x123 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738066] kasan_report.cold.6+0x241/0x2fd Aug 12 04:44:53 boke kernel: [ 64.738157] iscsi_target_login_sess_out.cold.11+0x58/0x123 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738246] iscsi_target_login_thread+0x10c4/0x1720 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738264] ? __sched_text_start+0x8/0x8 Aug 12 04:44:53 boke kernel: [ 64.738349] ? iscsi_target_login_sess_out+0x280/0x280 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738361] ? __kthread_parkme+0xcc/0x100 Aug 12 04:44:53 boke kernel: [ 64.738374] ? parse_args.cold.14+0xd3/0xd3 Aug 12 04:44:53 boke kernel: [ 64.738460] ? iscsi_target_login_sess_out+0x280/0x280 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738478] kthread+0x1a0/0x1c0 Aug 12 04:44:53 boke kernel: [ 64.738491] ? kthread_bind+0x30/0x30 Aug 12 04:44:53 boke kernel: [ 64.738502] ret_from_fork+0x35/0x40 Aug 12 04:44:53 boke kernel: [ 64.738510] Aug 12 04:44:53 boke kernel: [ 64.738600] Allocated by task 992: Aug 12 04:44:53 boke kernel: [ 64.738772] kasan_kmalloc+0xbf/0xe0 Aug 12 04:44:53 boke kernel: [ 64.738782] kmem_cache_alloc_trace+0x112/0x210 Aug 12 04:44:53 boke kernel: [ 64.738865] iscsi_target_login_thread+0x844/0x1720 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.738875] kthread+0x1a0/0x1c0 Aug 12 04:44:53 boke kernel: [ 64.738884] ret_from_fork+0x35/0x40 Aug 12 04:44:53 boke kernel: [ 64.738887] Aug 12 04:44:53 boke kernel: [ 64.738973] Freed by task 992: Aug 12 04:44:53 boke kernel: [ 64.739129] __kasan_slab_free+0x125/0x170 Aug 12 04:44:53 boke kernel: [ 64.739137] kfree+0x90/0x1d0 Aug 12 04:44:53 boke kernel: [ 64.739220] iscsi_target_login_thread+0x15c7/0x1720 [iscsi_target_mod] Aug 12 04:44:53 boke kernel: [ 64.739230] kthread+0x1a0/0x1c0 Aug 12 04:44:53 boke kernel: [ 64.739239] ret_from_fork+0x35/0x40 Aug 12 04:44:53 boke kernel: [ 64.739241] Aug 12 04:44:53 boke kernel: [ 64.739330] The buggy address belongs to the object at 880113ca6a00 Aug 12 04:44:53 boke kernel: [ 64.739330] which belongs to the cache kmalloc-512 of size 512 Aug 12 04:44:53 boke kernel: [ 64.739877] The buggy address is located 456 bytes inside of Aug 12 04:44:53 boke kernel: [ 64.739877] 512-byte region [880113ca6a00, 880113ca6c00) Aug 12 04:44:53 boke kernel: [ 64.740385] The buggy address belongs to the page: Aug 12 04:44:53 boke kernel: [ 64.740611] page:ea00044f2980 count:1 mapcount:0 mapping: index:0x0 compound_mapcount: 0 Aug 12 04:44:53 boke kernel: [ 64.741053] flags: 0x17fffc08100(slab|head) Aug 12 04:44:53 boke kernel: [ 64.741273] raw: 017fffc08100 0001000c000c Aug 12 04:44:53 boke kernel: [ 64.741626] raw: dead0100 dead0200 88011b002e00 Aug 12 04:44:53 boke kernel: [ 64.741971] page dumped because: kasan: bad access detected Aug 12 04:44:53 boke kernel: [ 64.74] Aug 12 04:44:53 boke kernel: [ 64.742304] Memory state around the buggy address: Aug 12 04:44:53 boke kernel: [ 64.742531] 880113ca6a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Aug 12 04:44:53 boke kernel: [ 64.742858] 880113ca6b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Aug 12 04:44:53 boke kernel: [ 64.747255] >880113ca6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Aug 12 04:44:53 boke kernel: [ 64.751156] ^ Aug 12
Re: BUG in slab_free after iSCSI login timeout
On Sat, 2018-08-11 at 09:36 +, Vincent Pelletier wrote: > What can I try to help debug this further ? Can you try to reproduce this with KASAN enabled in the kernel config? Thanks, Bart.
[PATCH] scsi: csiostor: fix incorrect port capabilities
- use be32_to_cpu() instead of ntohs() for 32 bit port capabilities. - add a new function fwcaps32_to_caps16() to convert 32 bit port capabilities to 16 bit port capabilities. Signed-off-by: Varun Prakash --- drivers/scsi/csiostor/csio_hw.c | 55 - drivers/scsi/csiostor/csio_hw.h | 1 + drivers/scsi/csiostor/csio_mb.c | 6 ++--- 3 files changed, 48 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/csiostor/csio_hw.c b/drivers/scsi/csiostor/csio_hw.c index 1ee255a..e4baf04 100644 --- a/drivers/scsi/csiostor/csio_hw.c +++ b/drivers/scsi/csiostor/csio_hw.c @@ -1513,6 +1513,46 @@ fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16) } /** + * fwcaps32_to_caps16 - convert 32-bit Port Capabilities to 16-bits + * @caps32: a 32-bit Port Capabilities value + * + * Returns the equivalent 16-bit Port Capabilities value. Note that + * not all 32-bit Port Capabilities can be represented in the 16-bit + * Port Capabilities and some fields/values may not make it. + */ +fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32) +{ + fw_port_cap16_t caps16 = 0; + + #define CAP32_TO_CAP16(__cap) \ + do { \ + if (caps32 & FW_PORT_CAP32_##__cap) \ + caps16 |= FW_PORT_CAP_##__cap; \ + } while (0) + + CAP32_TO_CAP16(SPEED_100M); + CAP32_TO_CAP16(SPEED_1G); + CAP32_TO_CAP16(SPEED_10G); + CAP32_TO_CAP16(SPEED_25G); + CAP32_TO_CAP16(SPEED_40G); + CAP32_TO_CAP16(SPEED_100G); + CAP32_TO_CAP16(FC_RX); + CAP32_TO_CAP16(FC_TX); + CAP32_TO_CAP16(802_3_PAUSE); + CAP32_TO_CAP16(802_3_ASM_DIR); + CAP32_TO_CAP16(ANEG); + CAP32_TO_CAP16(FORCE_PAUSE); + CAP32_TO_CAP16(MDIAUTO); + CAP32_TO_CAP16(MDISTRAIGHT); + CAP32_TO_CAP16(FEC_RS); + CAP32_TO_CAP16(FEC_BASER_RS); + + #undef CAP32_TO_CAP16 + + return caps16; +} + +/** * lstatus_to_fwcap - translate old lstatus to 32-bit Port Capabilities * @lstatus: old FW_PORT_ACTION_GET_PORT_INFO lstatus value * @@ -1670,7 +1710,7 @@ csio_enable_ports(struct csio_hw *hw) val = 1; csio_mb_params(hw, mbp, CSIO_MB_DEFAULT_TMO, - hw->pfn, 0, 1, , , false, + hw->pfn, 0, 1, , , true, NULL); if (csio_mb_issue(hw, mbp)) { @@ -1680,16 +1720,9 @@ csio_enable_ports(struct csio_hw *hw) return -EINVAL; } - csio_mb_process_read_params_rsp(hw, mbp, , 1, - ); - if (retval != FW_SUCCESS) { - csio_err(hw, "FW_PARAMS_CMD(r) port:%d failed: 0x%x\n", -portid, retval); - mempool_free(mbp, hw->mb_mempool); - return -EINVAL; - } - - fw_caps = val; + csio_mb_process_read_params_rsp(hw, mbp, , + 0, NULL); + fw_caps = retval ? FW_CAPS16 : FW_CAPS32; } /* Read PORT information */ diff --git a/drivers/scsi/csiostor/csio_hw.h b/drivers/scsi/csiostor/csio_hw.h index 9e73ef7..e351af6 100644 --- a/drivers/scsi/csiostor/csio_hw.h +++ b/drivers/scsi/csiostor/csio_hw.h @@ -639,6 +639,7 @@ int csio_handle_intr_status(struct csio_hw *, unsigned int, fw_port_cap32_t fwcap_to_fwspeed(fw_port_cap32_t acaps); fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16); +fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32); fw_port_cap32_t lstatus_to_fwcap(u32 lstatus); int csio_hw_start(struct csio_hw *); diff --git a/drivers/scsi/csiostor/csio_mb.c b/drivers/scsi/csiostor/csio_mb.c index c026417..6f13673 100644 --- a/drivers/scsi/csiostor/csio_mb.c +++ b/drivers/scsi/csiostor/csio_mb.c @@ -368,7 +368,7 @@ csio_mb_port(struct csio_hw *hw, struct csio_mb *mbp, uint32_t tmo, FW_CMD_LEN16_V(sizeof(*cmdp) / 16)); if (fw_caps == FW_CAPS16) - cmdp->u.l1cfg.rcap = cpu_to_be32(fc); + cmdp->u.l1cfg.rcap = cpu_to_be32(fwcaps32_to_caps16(fc)); else cmdp->u.l1cfg32.rcap32 = cpu_to_be32(fc); } @@ -395,8 +395,8 @@ csio_mb_process_read_port_rsp(struct csio_hw *hw, struct csio_mb *mbp, *pcaps = fwcaps16_to_caps32(ntohs(rsp->u.info.pcap)); *acaps = fwcaps16_to_caps32(ntohs(rsp->u.info.acap)); } else { - *pcaps = ntohs(rsp->u.info32.pcaps32); - *acaps = ntohs(rsp->u.info32.acaps32); + *pcaps =