Re: [RFC PATCH v1 2/2] vsock/test: SO_RCVLOWAT + deferred credit update test

2023-11-16 Thread Arseniy Krasnov



On 15.11.2023 14:11, Stefano Garzarella wrote:
> On Wed, Nov 08, 2023 at 10:20:04AM +0300, Arseniy Krasnov wrote:
>> This adds test which checks, that updating SO_RCVLOWAT value also sends
> 
> You can avoid "This adds", and write just "Add test ...".
> 
> See 
> https://docs.kernel.org/process/submitting-patches.html#describe-your-changes
> 
>     Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>     instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>     to do frotz", as if you are giving orders to the codebase to change
>     its behaviour.
> 
> Also in the other patch.
> 
>> credit update message. Otherwise mutual hungup may happen when receiver
>> didn't send credit update and then calls 'poll()' with non default
>> SO_RCVLOWAT value (e.g. waiting enough bytes to read), while sender
>> waits for free space at receiver's side.
>>
>> Signed-off-by: Arseniy Krasnov 
>> ---
>> tools/testing/vsock/vsock_test.c | 131 +++
>> 1 file changed, 131 insertions(+)
>>
>> diff --git a/tools/testing/vsock/vsock_test.c 
>> b/tools/testing/vsock/vsock_test.c
>> index c1f7bc9abd22..c71b3875fd16 100644
>> --- a/tools/testing/vsock/vsock_test.c
>> +++ b/tools/testing/vsock/vsock_test.c
>> @@ -1180,6 +1180,132 @@ static void test_stream_shutrd_server(const struct 
>> test_opts *opts)
>> close(fd);
>> }
>>
>> +#define RCVLOWAT_CREDIT_UPD_BUF_SIZE    (1024 * 128)
>> +#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE    (1024 * 64)
> 
> What about adding a comment like the one in the cover letter about
> dependency with kernel values?
> 
> Please add it also in the commit description.
> 
> I'm thinking if we should move all the defines that depends on the
> kernel in some special header.

IIUC it will be new header file in tools/testing/vsock, which includes such 
defines. At
this moment in will contain only VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. Idea is that 
such defines
are not supposed to use by user (so do not move it to uapi headers), but needed 
by tests
to check kernel behaviour. Please correct me if i'm wrong.

Thanks, Arseniy

> 
>> +
>> +static void test_stream_rcvlowat_def_cred_upd_client(const struct test_opts 
>> *opts)
>> +{
>> +    size_t buf_size;
>> +    void *buf;
>> +    int fd;
>> +
>> +    fd = vsock_stream_connect(opts->peer_cid, 1234);
>> +    if (fd < 0) {
>> +    perror("connect");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    /* Send 1 byte more than peer's buffer size. */
>> +    buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE + 1;
>> +
>> +    buf = malloc(buf_size);
>> +    if (!buf) {
>> +    perror("malloc");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    /* Wait until peer sets needed buffer size. */
>> +    control_expectln("SRVREADY");
>> +
>> +    if (send(fd, buf, buf_size, 0) != buf_size) {
>> +    perror("send failed");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    free(buf);
>> +    close(fd);
>> +}
>> +
>> +static void test_stream_rcvlowat_def_cred_upd_server(const struct test_opts 
>> *opts)
>> +{
>> +    size_t recv_buf_size;
>> +    struct pollfd fds;
>> +    size_t buf_size;
>> +    void *buf;
>> +    int fd;
>> +
>> +    fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>> +    if (fd < 0) {
>> +    perror("accept");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE;
>> +
>> +    if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE,
>> +   _size, sizeof(buf_size))) {
>> +    perror("setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    buf = malloc(buf_size);
>> +    if (!buf) {
>> +    perror("malloc");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    control_writeln("SRVREADY");
>> +
>> +    /* Wait until there will be 128KB of data in rx queue. */
>> +    while (1) {
>> +    ssize_t res;
>> +
>> +    res = recv(fd, buf, buf_size, MSG_PEEK);
>> +    if (res == buf_size)
>> +    break;
>> +
>> +    if (res <= 0) {
>> +    fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
>> +    exit(EXIT_FAILURE);
>> +    }
>> +    }
>> +
>> +    /* There is 128KB of data in the socket's rx queue,
>> + * dequeue first 64KB, credit update is not sent.
>> + */
>> +    recv_buf_size = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>> +    recv_buf(fd, buf, recv_buf_size, 0, recv_buf_size);
>> +    recv_buf_size++;
>> +
>> +    /* Updating SO_RCVLOWAT will send credit update. */
>> +    if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
>> +   _buf_size, sizeof(recv_buf_size))) {
>> +    perror("setsockopt(SO_RCVLOWAT)");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>> +    memset(, 0, sizeof(fds));
>> +    fds.fd = fd;
>> +    fds.events = POLLIN | POLLRDNORM | POLLERR |
>> + POLLRDHUP | POLLHUP;
>> +
>> +    /* This 'poll()' will return once we receive last byte
>> + * sent by client.
>> + */
>> +    if (poll(, 1, -1) 

Re: [RFC PATCH v1 1/2] virtio/vsock: send credit update during setting SO_RCVLOWAT

2023-11-16 Thread Arseniy Krasnov



On 15.11.2023 14:08, Stefano Garzarella wrote:
> On Wed, Nov 08, 2023 at 10:20:03AM +0300, Arseniy Krasnov wrote:
>> This adds sending credit update message when SO_RCVLOWAT is updated and
>> it is bigger than number of bytes in rx queue. It is needed, because
>> 'poll()' will wait until number of bytes in rx queue will be not smaller
>> than SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual
>> hungup for tx/rx is possible: sender waits for free space and receiver
>> is waiting data in 'poll()'.
>>
>> Signed-off-by: Arseniy Krasnov 
>> ---
>> drivers/vhost/vsock.c   |  2 ++
>> include/linux/virtio_vsock.h    |  1 +
>> net/vmw_vsock/virtio_transport.c    |  2 ++
>> net/vmw_vsock/virtio_transport_common.c | 31 +
>> net/vmw_vsock/vsock_loopback.c  |  2 ++
>> 5 files changed, 38 insertions(+)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index f75731396b7e..ecfa5c11f5ee 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -451,6 +451,8 @@ static struct virtio_transport vhost_transport = {
>>     .notify_buffer_size   = virtio_transport_notify_buffer_size,
>>
>>     .read_skb = virtio_transport_read_skb,
>> +
>> +    .set_rcvlowat = virtio_transport_set_rcvlowat
>> },
>>
>> .send_pkt = vhost_transport_send_pkt,
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index ebb3ce63d64d..97dc1bebc69c 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct 
>> virtio_vsock_sock *vvs, u32 credit);
>> void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
>> int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
>> int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t 
>> read_actor);
>> +int virtio_transport_set_rcvlowat(struct vsock_sock *vsk, int val);
>> #endif /* _LINUX_VIRTIO_VSOCK_H */
>> diff --git a/net/vmw_vsock/virtio_transport.c 
>> b/net/vmw_vsock/virtio_transport.c
>> index af5bab1acee1..cf3431189d0c 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -539,6 +539,8 @@ static struct virtio_transport virtio_transport = {
>>     .notify_buffer_size   = virtio_transport_notify_buffer_size,
>>
>>     .read_skb = virtio_transport_read_skb,
>> +
>> +    .set_rcvlowat = virtio_transport_set_rcvlowat
>> },
>>
>> .send_pkt = virtio_transport_send_pkt,
>> diff --git a/net/vmw_vsock/virtio_transport_common.c 
>> b/net/vmw_vsock/virtio_transport_common.c
>> index e22c81435ef7..88a58163046e 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -1676,6 +1676,37 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, 
>> skb_read_actor_t recv_acto
>> }
>> EXPORT_SYMBOL_GPL(virtio_transport_read_skb);
>>
>> +int virtio_transport_set_rcvlowat(struct vsock_sock *vsk, int val)
>> +{
>> +    struct virtio_vsock_sock *vvs = vsk->trans;
>> +    bool send_update = false;
> 
> I'd declare this not initialized.
> 
>> +
>> +    spin_lock_bh(>rx_lock);
>> +
>> +    /* If number of available bytes is less than new
>> + * SO_RCVLOWAT value, kick sender to send more
>> + * data, because sender may sleep in its 'send()'
>> + * syscall waiting for enough space at our side.
>> + */
>> +    if (vvs->rx_bytes < val)
>> +    send_update = true;
> 
> Then here just:
> send_update = vvs->rx_bytes < val;
> 
>> +
>> +    spin_unlock_bh(>rx_lock);
>> +
>> +    if (send_update) {
>> +    int err;
>> +
>> +    err = virtio_transport_send_credit_update(vsk);
>> +    if (err < 0)
>> +    return err;
>> +    }
>> +
>> +    WRITE_ONCE(sk_vsock(vsk)->sk_rcvlowat, val ? : 1);
> 
> Not in this patch, but what about doing this in vsock_set_rcvlowat() in 
> af_vsock.c?
> 
> I mean avoid to return if `transport->set_rcvlowat(vsk, val)` is
> successfully, so set sk_rcvlowat in a single point.

Yes, we can do it, I'll include new patch as 0001 in v2, don't remember why it 
wasn't implemented in this
way before.

Thanks, Arseniy

> 
> The rest LGTM!
> 
> Stefano
> 
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_set_rcvlowat);
>> +
>> MODULE_LICENSE("GPL v2");
>> MODULE_AUTHOR("Asias He");
>> MODULE_DESCRIPTION("common code for virtio vsock");
>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>> index 048640167411..388c157f6633 100644
>> --- a/net/vmw_vsock/vsock_loopback.c
>> +++ b/net/vmw_vsock/vsock_loopback.c
>> @@ -98,6 +98,8 @@ static struct virtio_transport loopback_transport = {
>>     .notify_buffer_size   = virtio_transport_notify_buffer_size,
>>
>>     .read_skb = virtio_transport_read_skb,
>> +
>> +    .set_rcvlowat = virtio_transport_set_rcvlowat
>> },
>>
>> .send_pkt = 

Re: [PATCH 27/32] s390/string: Add KMSAN support

2023-11-16 Thread kernel test robot
Hi Ilya,

kernel test robot noticed the following build errors:

[auto build test ERROR on s390/features]
[also build test ERROR on akpm-mm/mm-everything linus/master 
vbabka-slab/for-next v6.7-rc1 next-20231116]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Ilya-Leoshkevich/ftrace-Unpoison-ftrace_regs-in-ftrace_ops_list_func/20231116-045608
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
patch link:
https://lore.kernel.org/r/20231115203401.2495875-28-iii%40linux.ibm.com
patch subject: [PATCH 27/32] s390/string: Add KMSAN support
config: s390-debug_defconfig 
(https://download.01.org/0day-ci/archive/20231117/202311170550.bsbo44ix-...@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20231117/202311170550.bsbo44ix-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202311170550.bsbo44ix-...@intel.com/

All errors (new ones prefixed by >>):

   s390-linux-ld: arch/s390/mm/vmem.o: in function `crst_table_init':
>> arch/s390/include/asm/pgalloc.h:33:(.text+0x1ba): undefined reference to 
>> `memset64'
   s390-linux-ld: arch/s390/mm/vmem.o: in function `vmem_pte_alloc':
>> arch/s390/mm/vmem.c:68:(.ref.text+0x1ec): undefined reference to `memset64'
   s390-linux-ld: arch/s390/mm/pgalloc.o: in function `base_pgt_alloc':
>> arch/s390/mm/pgalloc.c:241:(.text+0x184): undefined reference to `memset64'
   s390-linux-ld: arch/s390/mm/pgalloc.o: in function `crst_table_init':
   arch/s390/include/asm/pgalloc.h:33:(.text+0x3e8): undefined reference to 
`memset64'
>> s390-linux-ld: arch/s390/include/asm/pgalloc.h:33:(.text+0x568): undefined 
>> reference to `memset64'
   s390-linux-ld: arch/s390/mm/pgalloc.o:arch/s390/include/asm/pgalloc.h:33: 
more undefined references to `memset64' follow
   s390-linux-ld: lib/test_string.o: in function `memset16_selftest':
>> lib/test_string.c:19:(.init.text+0x94): undefined reference to `memset16'
   s390-linux-ld: lib/test_string.o: in function `memset32_selftest':
>> lib/test_string.c:55:(.init.text+0x234): undefined reference to `memset32'
   s390-linux-ld: lib/test_string.o: in function `memset64_selftest':
>> lib/test_string.c:91:(.init.text+0x3c2): undefined reference to `memset64'
   s390-linux-ld: drivers/video/fbdev/core/fbcon.o: in function `scr_memsetw':
>> include/linux/vt_buffer.h:36:(.text+0x30f6): undefined reference to 
>> `memset16'
>> s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x320a): undefined 
>> reference to `memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x32c4): undefined 
reference to `memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x33b8): undefined 
reference to `memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x4f60): undefined 
reference to `memset16'
   s390-linux-ld: 
drivers/video/fbdev/core/fbcon.o:include/linux/vt_buffer.h:36: more undefined 
references to `memset16' follow
   s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_copy_area':
>> drivers/tty/vt/vt.c:464:(.text+0x107e): undefined reference to `memset32'
>> s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1104): undefined reference 
>> to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1118): undefined reference 
to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1140): undefined reference 
to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_insert':
   drivers/tty/vt/vt.c:374:(.text+0x13a4): undefined reference to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.o:drivers/tty/vt/vt.c:385: more undefined 
references to `memset32' follow
   s390-linux-ld: drivers/tty/vt/vt.o: in function `scr_memsetw':
   include/linux/vt_buffer.h:36:(.text+0x2844): undefined reference to 
`memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x2932): undefined 
reference to `memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x2fe8): undefined 
reference to `memset16'
   s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x319c): undefined 
reference to `memset16'
   s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_clear_line':
   drivers/tty/vt/vt.c:393:(.text+0x3f78): undefined reference to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_clear_lines':
   drivers/tty/vt/vt.c:401:(.text+0x3fb8): undefined reference to `memset32'
   s390-linux-ld: drivers/tty/vt/vt.c:401:(.text+0x3fe2): undefined reference 
to `memset32'
  

EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2023-11-16 Thread Tobias Huschle

Hi,

when testing the EEVDF scheduler we stumbled upon a performance 
regression in a uperf scenario and would like to
kindly ask for feedback on whether we are going into the right direction 
with our analysis so far.


The base scenario are two KVM guests running on an s390 LPAR. One guest 
hosts the uperf server, one the uperf client.

With EEVDF we observe a regression of ~50% for a strburst test.
For a more detailed description of the setup see the section TEST 
SUMMARY at the bottom.


Bisecting led us to the following commit which appears to introduce the 
regression:

86bfbb7ce4f6 sched/fair: Add lag based placement

We then compared the last good commit we identified with a recent level 
of the devel branch.
The issue still persists on 6.7 rc1 although there is some improvement 
(down from 62% regression to 49%)


All analysis described further are based on a 6.6 rc7 kernel.

We sampled perf data to get an idea on what is going wrong and ended up 
seeing an dramatic increase in the maximum
wait times from 3ms up to 366ms. See section WAIT DELAYS below for more 
details.


We then collected tracing data to get a better insight into what is 
going on.
The trace excerpt in section TRACE EXCERPT shows one example (of 
multiple per test run) of the problematic scenario where

a kworker(pid=6525) has to wait for 39,718 ms.

Short summary:
The mentioned kworker has been scheduled to CPU 14 before the tracing 
was enabled.

A vhost process is migrated onto CPU 14.
The vruntimes of kworker and vhost differ significantly (86642125805 vs 
4242563284 -> factor 20)
The vhost process wants to wake up the kworker, therefore the kworker is 
placed onto the runqueue again and set to runnable.
The vhost process continues to execute, waking up other vhost processes 
on other CPUs.


So far this behavior is not different to what we see on pre-EEVDF 
kernels.


On timestamp 576.162767, the vhost process triggers the last wake up of 
another vhost on another CPU.
Until timestamp 576.171155, we see no other activity. Now, the vhost 
process ends its time slice.
Then, vhost gets re-assigned new time slices 4 times and gets then 
migrated off to CPU 15.

This does not occur with older kernels.
The kworker has to wait for the migration to happen in order to be able 
to execute again.
This is due to the fact, that the vruntime of the kworker is 
significantly larger than the one of vhost.



We observed the large difference in vruntime between kworker and vhost 
in the same magnitude on

a kernel built based on the parent of the commit mentioned above.
With EEVDF, the kworker is doomed to wait until the vhost either catches 
up on vruntime (which would take 86 seconds)

or the vhost is migrated off of the CPU.

We found some options which sound plausible but we are not sure if they 
are valid or not:


1. The wake up path has a dependency on the vruntime metrics that now 
delays the execution of the kworker.
2. The previous commit af4cf40470c2 (sched/fair: Add 
cfs_rq::avg_vruntime) which updates the way cfs_rq->min_vruntime and
cfs_rq->avg_runtime are set might have introduced an issue which is 
uncovered with the commit mentioned above.
3. An assumption in the vhost code which causes vhost to rely on being 
scheduled off in time to allow the kworker to proceed.


We also stumbled upon the following mailing thread: 
https://lore.kernel.org/lkml/ZORaUsd+So+tnyMV@chenyu5-mobl2/
That conversation, and the patches derived from it lead to the 
assumption that the wake up path might be adjustable in a way

that this case in particular can be addressed.
At the same time, the vast difference in vruntimes is concerning since, 
at least for some time frame, both processes are on the runqueue.


We would be glad to hear some feedback on which paths to pursue and 
which might just be a dead end in the first place.



 TRACE EXCERPT 
The sched_place trace event was added to the end of the place_entity 
function and outputs:

sev -> sched_entity vruntime
sed -> sched_entity deadline
sel -> sched_entity vlag
avg -> cfs_rq avg_vruntime
min -> cfs_rq min_vruntime
cpu -> cpu of cfs_rq
nr  -> cfs_rq nr_running
---
CPU 3/KVM-2950[014] d   576.161432: sched_migrate_task: 
comm=vhost-2920 pid=2941 prio=120 orig_cpu=15 dest_cpu=14

--> migrates task from cpu 15 to 14
CPU 3/KVM-2950[014] d   576.161433: sched_place: 
comm=vhost-2920 pid=2941 sev=4242563284 sed=4245563284 sel=0 
avg=4242563284 min=4242563284 cpu=14 nr=0

--> places vhost 2920 on CPU 14 with vruntime 4242563284
CPU 3/KVM-2950[014] d   576.161433: sched_place: comm= pid=0 
sev=16329848593 sed=16334604010 sel=0 avg=16329848593 min=16329848593 
cpu=14 nr=0
CPU 3/KVM-2950[014] d   576.161433: sched_place: comm= pid=0 
sev=42560661157 sed=42627443765 sel=0 avg=42560661157 min=42560661157 
cpu=14 nr=0
CPU 3/KVM-2950[014] d   576.161434: sched_place: comm= pid=0 
sev=53846627372 

Re: [PATCH 2/3] modpost: Extended modversion support

2023-11-16 Thread Luis Chamberlain
On Wed, Nov 15, 2023 at 06:50:10PM +, Matthew Maurer wrote:
> Adds a new format for modversions which stores each field in a separate
> elf section.

The "why" is critical and not mentioned. And I'd like to also see
documented this with foresight, if Rust needed could this be used
in the future for other things?

Also please include folks CC'd in *one* patch to *all* patches as
otherwise we have no context.

> This initially adds support for variable length names, but
> could later be used to add additional fields to modversions in a
> backwards compatible way if needed.
> 
> Adding support for variable length names makes it possible to enable
> MODVERSIONS and RUST at the same time.
> 
> Signed-off-by: Matthew Maurer 
> ---
>  arch/powerpc/kernel/module_64.c | 24 +-

Why was only powerpc modified? If the commit log explained this it would
make it easier for review.

> diff --git a/kernel/module/internal.h b/kernel/module/internal.h
> index c8b7b4dcf782..0c188c96a045 100644
> --- a/kernel/module/internal.h
> +++ b/kernel/module/internal.h
> @@ -80,7 +80,7 @@ struct load_info {
>   unsigned int used_pages;
>  #endif
>   struct {
> - unsigned int sym, str, mod, vers, info, pcpu;
> + unsigned int sym, str, mod, vers, info, pcpu, vers_ext_crc, 
> vers_ext_name;

We might as well modify this in a preliminary patch to add each new
unsinged int in a new line, so that it is easier to blame when each new
entry gets added. It should not grow the size of the struct at all but
it would make futur extensions easier to review what is new and git
blame easier to spot when something was added.

Although we don't use this extensively today this can easily grow for
convenience and making code easier to read.

> diff --git a/kernel/module/version.c b/kernel/module/version.c
> index 53f43ac5a73e..93d97dad8c77 100644
> --- a/kernel/module/version.c
> +++ b/kernel/module/version.c
> @@ -19,11 +19,28 @@ int check_version(const struct load_info *info,
>   unsigned int versindex = info->index.vers;
>   unsigned int i, num_versions;
>   struct modversion_info *versions;
> + struct modversion_info_ext version_ext;
>  
>   /* Exporting module didn't supply crcs?  OK, we're already tainted. */
>   if (!crc)
>   return 1;
>  
> + /* If we have extended version info, rely on it */
> + if (modversion_ext_start(info, _ext) >= 0) {

There are two things we need to do to make processing modules easier:

  1) ELF validation
  2) Once checked then process the information

We used to have this split up but also had a few places which did both
1) and 2) together. This was wrong and so I want to keep things tidy
and ensure we do things which validate the ELF separate. To that
end please put the checks to validate the ELF first so that we report
to users with a proper error/debug check in case the ELF is wrong,
this enables futher debug checks for that to be done instead of
confusing users who end up scratching their heads why something
failed.

So please split up the ELF validation check and put that into
elf_validity_cache_copy() which runs *earlier* than this.

Then *if* if has this, you just process it. Please take care to be
very pedantic in the elf_validity_cache_copy() and extend the checks
you have for validation in modversion_ext_start() and bring them to
elf_validity_cache_copy() with perhaps *more* stuff which does any
insane checks to verify it is 100% correct.

> + do {
> + if (strncmp(version_ext.name.value, symname,
> + version_ext.name.end - 
> version_ext.name.value) != 0)
> + continue;
> +
> + if (*version_ext.crc.value == *crc)
> + return 1;
> + pr_debug("Found checksum %X vs module %X\n",
> +  *crc, *version_ext.crc.value);
> + goto bad_version;
> + } while (modversion_ext_advance(_ext) == 0);

Can you do a for_each_foo()) type loop here instead after validation?
Because the validation would ensure your loop is bounded then. Look at
for_each_mod_mem_type() for inspiration.

> + goto broken_toolchain;

The broken toolchain thing would then be an issue reported in the
ELF validation.

> @@ -87,6 +105,65 @@ int same_magic(const char *amagic, const char *bmagic,
>   return strcmp(amagic, bmagic) == 0;
>  }
>  
> +#define MODVERSION_FIELD_START(sec, field) \
> + field.value = (typeof(field.value))sec.sh_addr; \
> + field.end = field.value + sec.sh_size
> +
> +ssize_t modversion_ext_start(const struct load_info *info,
> +  struct modversion_info_ext *start)
> +{
> + unsigned int crc_idx = info->index.vers_ext_crc;
> + unsigned int name_idx = info->index.vers_ext_name;
> + Elf_Shdr *sechdrs = info->sechdrs;
> +
> + // Both of these fields are needed for this to be 

Re: [PATCH 28/32] s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich  wrote:
>
> This is normally done by the generic entry code, but the
> kernel_stack_overflow() flow bypasses it.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 

> ---
>  arch/s390/kernel/traps.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
> index 1d2aa448d103..dd7362806dbb 100644
> --- a/arch/s390/kernel/traps.c
> +++ b/arch/s390/kernel/traps.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -260,6 +261,7 @@ static void monitor_event_exception(struct pt_regs *regs)
>
>  void kernel_stack_overflow(struct pt_regs *regs)
>  {
> +   kmsan_unpoison_entry_regs(regs);

I suggest adding a comment here.



Re: [PATCH 13/32] kmsan: Support SLAB_POISON

2023-11-16 Thread Ilya Leoshkevich
On Thu, 2023-11-16 at 15:55 +0100, Alexander Potapenko wrote:
> On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich 
> wrote:
> > 
> > Avoid false KMSAN negatives with SLUB_DEBUG by allowing
> > kmsan_slab_free() to poison the freed memory, and by preventing
> > init_object() from unpoisoning new allocations.
> > 
> > Signed-off-by: Ilya Leoshkevich 
> > ---
> >  mm/kmsan/hooks.c | 2 +-
> >  mm/slub.c    | 3 ++-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> > index 7b5814412e9f..7a30274b893c 100644
> > --- a/mm/kmsan/hooks.c
> > +++ b/mm/kmsan/hooks.c
> > @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void
> > *object)
> >     return;
> > 
> >     /* RCU slabs could be legally used after free within the
> > RCU period */
> > -   if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU |
> > SLAB_POISON)))
> > +   if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> >     return;
> >     /*
> >  * If there's a constructor, freed memory must remain in
> > the same state
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 63d281dfacdb..8d9aa4d7cb7e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1024,7 +1024,8 @@ static __printf(3, 4) void slab_err(struct
> > kmem_cache *s, struct slab *slab,
> >     add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> >  }
> > 
> > -static void init_object(struct kmem_cache *s, void *object, u8
> > val)
> > +__no_sanitize_memory static void
> 
> __no_sanitize_memory should be used with great care, because it drops
> all instrumentation from the function, and any shadow writes will be
> lost.
> Won't it be better to add kmsan_poison() to init_object() if you want
> it to stay uninitialized?

I wanted to avoid a ping-pong here, in which we already have properly
poisoned memory, then memset() incorrectly unpoisons it, and then we
undo the damage. My first attempt involved using __memset() instead,
but this resulted in worse assembly code. I wish there were something
like memset_noinstr().

Right now init_object() doesn't seem to be doing anything besides these
memset()s, but this can of course change in the future. So I don't mind
using kmsan_poison() instead of __no_sanitize_memory here too much,
since it results in better maintainability.



Re: [PATCH 26/32] s390/mm: Define KMSAN metadata for vmalloc and modules

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich  wrote:
>
> The pages for the KMSAN metadata associated with most kernel mappings
> are taken from memblock by the common code. However, vmalloc and module
> metadata needs to be defined by the architectures.
>
> Be a little bit more careful than x86: allocate exactly MODULES_LEN
> for the module shadow and origins, and then take 2/3 of vmalloc for
> the vmalloc shadow and origins. This ensures that users passing small
> vmalloc= values on the command line do not cause module metadata
> collisions.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  arch/s390/boot/startup.c|  8 
>  arch/s390/include/asm/pgtable.h | 10 ++
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
> index 8104e0e3d188..297c1062372a 100644
> --- a/arch/s390/boot/startup.c
> +++ b/arch/s390/boot/startup.c
> @@ -253,9 +253,17 @@ static unsigned long setup_kernel_memory_layout(void)
> MODULES_END = round_down(__abs_lowcore, _SEGMENT_SIZE);
> MODULES_VADDR = MODULES_END - MODULES_LEN;
> VMALLOC_END = MODULES_VADDR;
> +#ifdef CONFIG_KMSAN
> +   VMALLOC_END -= MODULES_LEN * 2;
> +#endif
>
> /* allow vmalloc area to occupy up to about 1/2 of the rest virtual 
> space left */
> vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, 
> _REGION3_SIZE));
> +#ifdef CONFIG_KMSAN
> +   /* take 2/3 of vmalloc area for KMSAN shadow and origins */
> +   vmalloc_size = round_down(vmalloc_size / 3, PAGE_SIZE);
Is it okay that vmalloc_size is only aligned on PAGE_SIZE?
E.g. above the alignment is _REGION3_SIZE.



Re: [PATCH v2 11/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable WiFi

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

Now that the WPSS remoteproc is enabled, enable wifi so we can use it.

Signed-off-by: Luca Weiss 
---

Reviewed-by: Konrad Dybcio 

Konrad


Re: [PATCH v2 10/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable various remoteprocs

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

Enable the ADSP, CDSP, MPSS and WPSS that are found on the SoC.

Signed-off-by: Luca Weiss 
---

Reviewed-by: Konrad Dybcio 

Konrad


Re: [PATCH v2 09/11] arm64: dts: qcom: sc7280: Add CDSP node

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

Add the node for the ADSP found on the SC7280 SoC, using standard
Qualcomm firmware.

Remove the reserved-memory node from sc7280-chrome-common since CDSP is
currently not used there.

Signed-off-by: Luca Weiss 
---

Acked-by: Konrad Dybcio 


[...]


+   interconnects = <_noc MASTER_CDSP_PROC 0 _virt 
SLAVE_EBI1 0>;

QCOM_ICC_TAG_ALWAYS

Konrad


Re: [PATCH v2 08/11] arm64: dts: qcom: sc7280: Add ADSP node

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

Add the node for the ADSP found on the SC7280 SoC, using standard
Qualcomm firmware.

Signed-off-by: Luca Weiss 
---

Acked-by: Konrad Dybcio 

Konrad


Re: [PATCH 13/32] kmsan: Support SLAB_POISON

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Avoid false KMSAN negatives with SLUB_DEBUG by allowing
> kmsan_slab_free() to poison the freed memory, and by preventing
> init_object() from unpoisoning new allocations.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  mm/kmsan/hooks.c | 2 +-
>  mm/slub.c| 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> index 7b5814412e9f..7a30274b893c 100644
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void *object)
> return;
>
> /* RCU slabs could be legally used after free within the RCU period */
> -   if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
> +   if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> return;
> /*
>  * If there's a constructor, freed memory must remain in the same 
> state
> diff --git a/mm/slub.c b/mm/slub.c
> index 63d281dfacdb..8d9aa4d7cb7e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1024,7 +1024,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache 
> *s, struct slab *slab,
> add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>  }
>
> -static void init_object(struct kmem_cache *s, void *object, u8 val)
> +__no_sanitize_memory static void

__no_sanitize_memory should be used with great care, because it drops
all instrumentation from the function, and any shadow writes will be
lost.
Won't it be better to add kmsan_poison() to init_object() if you want
it to stay uninitialized?



Re: [PATCH v2 06/11] remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

Add support for the ADSP, CDSP and WPSS remoteprocs found on the SC7280
SoC using the q6v5-pas driver.

This driver can be used on regular LA ("Linux Android") based releases,
however the SC7280 ChromeOS devices need different driver support due to
firmware differences.

Signed-off-by: Luca Weiss 
---

Reviewed-by: Konrad Dybcio 

Konrad


Re: [PATCH v2 03/11] arm64: dts: qcom: sc7280: Rename reserved-memory nodes

2023-11-16 Thread Konrad Dybcio




On 11/13/23 09:56, Luca Weiss wrote:

It was clarified a while ago that reserved-memory nodes shouldn't be
called memory@ but should have a descriptive name. Update sc7280.dtsi to
follow that.

Signed-off-by: Luca Weiss 
---

Reviewed-by: Konrad Dybcio 

Konrad


Re: [GIT PULL] vhost,virtio,vdpa,firmware: bugfixes

2023-11-16 Thread pr-tracker-bot
The pull request you sent on Tue, 14 Nov 2023 15:24:36 -0500:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/372bed5fbb87314abf410c3916e51578cd382cd1

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))

2023-11-16 Thread Naresh Kamboju
Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2
device running stable-rc linux-6.6.y kernel.

This kernel crash is hard to reproduce.

Reported-by: Linux Kernel Functional Testing 

kselftest: Running tests in ftrace
TAP version 13
1..1
# timeout set to 0
# selftests: ftrace: ftracetest-ktap
# unlink: cannot unlink '/opt/kselftests/default-in-kernel/ftrace/logs/latest': 
No such file or directory
# TAP version 13
# 1..130
# ok 1 Basic trace file check
...
# ok 44 ftrace - test for function traceon/off triggers
# ok 45 ftrace - test tracing error log support
[  617.613901] [ cut here ]
[  617.618870] list_del corruption. prev->next should be 000837184758, but 
was 000837184638. (prev=0008359f82e8)
[  617.629894] WARNING: CPU: 1 PID: 13512 at lib/list_debug.c:62 
__list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1)) 
[  617.639496] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce 
drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables 
[last unloaded: trace_printk]
[  617.655832] CPU: 1 PID: 13512 Comm: ls Not tainted 6.6.2-rc1 #1
[  617.661765] Hardware name: ARM Juno development board (r2) (DT)
[  617.667692] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  617.674668] pc : __list_del_entry_valid_or_report (lib/list_debug.c:62 
(discriminator 1)) 
[  617.680339] lr : __list_del_entry_valid_or_report (lib/list_debug.c:62 
(discriminator 1)) 
[  617.686009] sp : 80008507b910
[  617.689325] x29: 80008507b910 x28: 0003 x27: 0001
[  617.696482] x26: 000834694788 x25: 000835a39c88 x24: 
[  617.703637] x23:  x22: 0008359f8248 x21: 000837184720
[  617.710792] x20: 0008359f8248 x19: 000837184758 x18: 0010
[  617.717948] x17: 20747562202c3835 x16: 3734383137333830 x15: 30302065
[  617.725102] x14: 6220646c756f6873 x13: 2938653238663935 x12: 333830303066
[  617.732257] x11: 663d766572702820 x10: 2e38333634383137 x9 : 800080145634
[  617.739411] x8 : 80008507b638 x7 : 00017fe8 x6 : 00057fa8
[  617.746566] x5 : 0fff x4 : 00097ed27d50 x3 : 8008fc5d3000
[  617.753720] x2 :  x1 :  x0 : 0008010722c0
[  617.760874] Call trace:
[  617.763320] __list_del_entry_valid_or_report (lib/list_debug.c:62 
(discriminator 1)) 
[  617.768643] __dentry_kill (include/linux/list.h:215 (discriminator 1) 
fs/dcache.c:550 (discriminator 1) fs/dcache.c:603 (discriminator 1)) 
[  617.772311] dput (fs/dcache.c:896) 
[  617.775281] create_dentry (fs/tracefs/event_inode.c:404) 
[  617.779040] dcache_dir_open_wrapper (fs/tracefs/event_inode.c:536) 
[  617.783580] do_dentry_open (fs/open.c:929) 
[  617.787424] vfs_open (fs/open.c:1064) 
[  617.790572] path_openat (fs/namei.c:3640 fs/namei.c:3797) 
[  617.794153] do_filp_open (fs/namei.c:3824) 
[  617.797733] do_sys_openat2 (fs/open.c:1422) 
[  617.801490] __arm64_sys_openat (fs/open.c:1448) 
[  617.805508] invoke_syscall (arch/arm64/include/asm/current.h:19 
arch/arm64/kernel/syscall.c:56) 
[  617.809267] el0_svc_common.constprop.0 (include/linux/thread_info.h:127 
(discriminator 2) arch/arm64/kernel/syscall.c:144 (discriminator 2)) 
[  617.814069] do_el0_svc (arch/arm64/kernel/syscall.c:156) 
[  617.817391] el0_svc (arch/arm64/include/asm/daifflags.h:28 
arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 
arch/arm64/kernel/entry-common.c:679) 
[  617.820454] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697) 
[  617.824820] el0t_64_sync (arch/arm64/kernel/entry.S:595) 
[  617.828488] ---[ end trace  ]---
[  617.833214] [ cut here ]
[  617.837840] WARNING: CPU: 2 PID: 13523 at fs/dcache.c:365 dentry_free 
(fs/dcache.c:365 (discriminator 1)) 
[  617.845173] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce 
drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables 
[last unloaded: trace_printk]
[  617.861503] CPU: 2 PID: 13523 Comm: rmdir Tainted: GW  
6.6.2-rc1 #1
[  617.869175] Hardware name: ARM Juno development board (r2) (DT)
[  617.875102] pstate: 4005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  617.882077] pc : dentry_free (fs/dcache.c:365 (discriminator 1)) 
[  617.885835] lr : __dentry_kill (fs/dcache.c:623) 
[  617.889937] sp : 800085123a30
[  617.893252] x29: 800085123a30 x28: 0008371845a8 x27: 0008371845a8
[  617.900409] x26: 000837184600 x25: 000837184638 x24: 0008359f82a0
[  617.907565] x23: 000835a39d28 x22: 0008359f8248 x21: 000837184720
[  617.914719] x20: 0008359f8248 x19: 0008371846c8 x18: 
[  617.921874] x17: 00040044 x16: 00500074b5503510 x15: 
[  617.929029] x14:  x13: 0030 x12: 0101010101010101
[  

Re: [PATCH 07/32] kmsan: Remove a useless assignment from kmsan_vmap_pages_range_noflush()

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> The value assigned to prot is immediately overwritten on the next line
> with PAGE_KERNEL. The right hand side of the assignment has no
> side-effects.
>
> Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations")
> Suggested-by: Alexander Gordeev 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 

> ---
>  mm/kmsan/shadow.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
> index b9d05aff313e..2d57408c78ae 100644
> --- a/mm/kmsan/shadow.c
> +++ b/mm/kmsan/shadow.c
> @@ -243,7 +243,6 @@ int kmsan_vmap_pages_range_noflush(unsigned long start, 
> unsigned long end,
> s_pages[i] = shadow_page_for(pages[i]);
> o_pages[i] = origin_page_for(pages[i]);
> }
> -   prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
> prot = PAGE_KERNEL;

This bug dates back to 5.1-rc2, when KMSAN didn't exist upstream.
The commit introducing vmap support already had it:
https://github.com/google/kmsan/commit/3ff9d7c640d378485286e1a99d85984ae6901f23
I don't remember what exactly required the more relaxed PAGE_KERNEL
mask though :)



[syzbot] [bpf?] [trace?] possible deadlock in sctp_err_lookup

2023-11-16 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:f31817cbcf48 Add linux-next specific files for 20231116
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11a32f9768
kernel config:  https://syzkaller.appspot.com/x/.config?x=f59345f1d0a928c
dashboard link: https://syzkaller.appspot.com/bug?extid=422ecd5adb35122711b7
compiler:   gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 
2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/987488cb251e/disk-f31817cb.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/6d4a82d8bd4b/vmlinux-f31817cb.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/fc43dee9cb86/bzImage-f31817cb.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+422ecd5adb3512271...@syzkaller.appspotmail.com

=
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.7.0-rc1-next-20231116-syzkaller #0 Not tainted
-
syz-executor.2/5088 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
888025d21bd8 (>siglock){+.+.}-{2:2}, at: 
__lock_task_sighand+0xc2/0x340 kernel/signal.c:1422

and this task is already holding:
88802dd927b0 (slock-AF_INET6){+.-.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:351 [inline]
88802dd927b0 (slock-AF_INET6){+.-.}-{2:2}, at: __tcp_close+0x4e6/0xfd0 
net/ipv4/tcp.c:2843
which would create a new lock dependency:
 (slock-AF_INET6){+.-.}-{2:2} -> (>siglock){+.+.}-{2:2}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (slock-AF_INET6){+.-.}-{2:2}

... which became SOFTIRQ-irq-safe at:
  lock_acquire kernel/locking/lockdep.c:5753 [inline]
  lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5718
  __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
  _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
  spin_lock include/linux/spinlock.h:351 [inline]
  sctp_err_lookup+0x488/0xb50 net/sctp/input.c:523
  sctp_v6_err+0x201/0x540 net/sctp/ipv6.c:175
  icmpv6_notify+0x337/0x750 net/ipv6/icmp.c:867
  icmpv6_rcv+0x882/0x19c0 net/ipv6/icmp.c:1013
  ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438
  ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
  NF_HOOK include/linux/netfilter.h:314 [inline]
  NF_HOOK include/linux/netfilter.h:308 [inline]
  ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492
  dst_input include/net/dst.h:461 [inline]
  ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
  NF_HOOK include/linux/netfilter.h:314 [inline]
  NF_HOOK include/linux/netfilter.h:308 [inline]
  ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310
  __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529
  __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643
  process_backlog+0x101/0x6b0 net/core/dev.c:5971
  __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6533
  napi_poll net/core/dev.c:6602 [inline]
  net_rx_action+0x956/0xe90 net/core/dev.c:6735
  __do_softirq+0x216/0x8d5 kernel/softirq.c:553
  do_softirq kernel/softirq.c:454 [inline]
  do_softirq+0xaa/0xe0 kernel/softirq.c:441
  __local_bh_enable_ip+0xfc/0x120 kernel/softirq.c:381
  local_bh_enable include/linux/bottom_half.h:33 [inline]
  icmp6_send+0x7d5/0x2b10 net/ipv6/icmp.c:633
  __icmpv6_send include/linux/icmpv6.h:28 [inline]
  icmpv6_send include/linux/icmpv6.h:49 [inline]
  ip6_pkt_drop+0x1f3/0x860 net/ipv6/route.c:4515
  dst_output include/net/dst.h:451 [inline]
  NF_HOOK include/linux/netfilter.h:314 [inline]
  NF_HOOK include/linux/netfilter.h:308 [inline]
  ip6_xmit+0x1234/0x1cc0 net/ipv6/ip6_output.c:358
  sctp_v6_xmit+0xc1b/0x1110 net/sctp/ipv6.c:248
  sctp_packet_transmit+0x22e1/0x3020 net/sctp/output.c:653
  sctp_packet_singleton+0x19f/0x370 net/sctp/outqueue.c:783
  sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
  sctp_outq_flush+0x54d/0x3340 net/sctp/outqueue.c:1212
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1818 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
  sctp_do_sm+0x178f/0x5c50 net/sctp/sm_sideeffect.c:1169
  sctp_primitive_ASSOCIATE+0x9c/0xc0 net/sctp/primitive.c:73
  __sctp_connect+0x9e9/0xc30 net/sctp/socket.c:1233
  sctp_connect net/sctp/socket.c:4811 [inline]
  sctp_inet_connect+0x15f/0x1f0 net/sctp/socket.c:4826
  __sys_connect_file+0x15b/0x1a0 net/socket.c:2046
  __sys_connect+0x145/0x170 net/socket.c:2063
  __do_sys_connect net/socket.c:2073 [inline]
  __se_sys_connect net/socket.c:2070 [inline]
  __x64_sys_connect+0x72/0xb0 net/socket.c:2070
  do_syscall_x64 arch/x86/entry/common.c:51 [inline]
  do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
  entry_SYSCALL_64_after_hwframe+0x62/0x6a

to a SOFTIRQ-irq-unsafe lock:
 (>siglock){+.+.}-{2:2}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire kernel/locking/lockdep.c:5753 [inline]
  lock_acquire+0x1b1/0x530 kernel/locking

Re: [PATCH 19/32] kmsan: Accept ranges starting with 0 on s390

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> On s390 the virtual address 0 is valid (current CPU's lowcore is mapped
> there), therefore KMSAN should not complain about it.
>
> Disable the respective check on s390. There doesn't seem to be a
> Kconfig option to describe this situation, so explicitly check for
> s390.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 
(see the nit below)

> ---
>  mm/kmsan/init.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
> index ffedf4dbc49d..14f4a432fddd 100644
> --- a/mm/kmsan/init.c
> +++ b/mm/kmsan/init.c
> @@ -33,7 +33,9 @@ static void __init kmsan_record_future_shadow_range(void 
> *start, void *end)
> bool merged = false;
>
> KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
> -   KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
> +   KMSAN_WARN_ON((nstart >= nend) ||
> + (!IS_ENABLED(CONFIG_S390) && !nstart) ||
Please add a comment explaining this bit.

> + !nend);
> nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
> nend = ALIGN(nend, PAGE_SIZE);
>
> --
> 2.41.0
>



Re: [PATCH 00/32] kmsan: Enable on s390

2023-11-16 Thread Christian Borntraeger




Am 16.11.23 um 11:13 schrieb Ilya Leoshkevich:

It's also possible to get a free s390 machine at [1].

[1] https://linuxone.cloud.marist.edu/oss


I think the URL for registration is this one
https://linuxone.cloud.marist.edu/#/register?flag=VM



Re: [PATCH 00/32] kmsan: Enable on s390

2023-11-16 Thread Ilya Leoshkevich
On Thu, 2023-11-16 at 09:42 +0100, Alexander Potapenko wrote:
> On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich 
> wrote:
> > 
> > Hi,
> > 
> > This series provides the minimal support for Kernel Memory
> > Sanitizer on
> > s390. Kernel Memory Sanitizer is clang-only instrumentation for
> > finding
> > accesses to uninitialized memory. The clang support for s390 has
> > already
> > been merged [1].
> > 
> > With this series, I can successfully boot s390 defconfig and
> > debug_defconfig with kmsan.panic=1. The tool found one real
> > s390-specific bug (fixed in master).
> > 
> > Best regards,
> > Ilya
> 
> Hi Ilya,
> 
> This is really impressive!
> Can you please share some instructions on how to run KMSAN in QEMU?
> I've never touched s390, but I'm assuming it should be possible?

I developed this natively (without cross-compilation or emulation,
just KVM), but I just gave the following a try on x86_64 and had some
success:

$ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18
LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy debug_defconfig

$ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18
LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy menuconfig

$ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18
LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy -j24

$ qemu-system-s390x -M accel=tcg -smp 2 -m 4G -kernel ../linux-build-
s390x-cross/arch/s390/boot/bzImage -nographic -append 'root=/dev/vda1
rw console=ttyS1 nokaslr earlyprintk cio_ignore=all kmsan.panic=1' -
object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-
ccw,rng=rng0

It's also possible to get a free s390 machine at [1].

[1] https://linuxone.cloud.marist.edu/oss



Re: [PATCH 06/32] kmsan: Fix kmsan_copy_to_user() on arches with overlapping address spaces

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Comparing pointers with TASK_SIZE does not make sense when kernel and
> userspace overlap. Assume that we are handling user memory access in
> this case.
>
> Reported-by: Alexander Gordeev 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 14/32] kmsan: Use ALIGN_DOWN() in kmsan_get_metadata()

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Improve the readability by replacing the custom aligning logic with
> ALIGN_DOWN(). Unlike other places where a similar sequence is used,
> there is no size parameter that needs to be adjusted, so the standard
> macro fits.

Good catch, thank you!

> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 21/32] s390: Use a larger stack for KMSAN

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Adjust the stack size for the KMSAN-enabled kernel like it was done
> for the KASAN-enabled one in commit 7fef92ccadd7 ("s390/kasan: double
> the stack size"). Both tools have similar requirements.
>
> Reviewed-by: Alexander Gordeev 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 08/32] kmsan: Remove an x86-specific #include from kmsan.h

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Replace the x86-specific asm/pgtable_64_types.h #include with the
> linux/pgtable.h one, which all architectures have.
>
> Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
> Suggested-by: Heiko Carstens 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 
(see the comment below)

>
> -#include 
> +#include 

For the sake of consistency with other KMSAN code, please keep the
headers sorted alphabetically.



Re: [PATCH 03/32] kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> KMSAN relies on memblock returning all available pages to it
> (see kmsan_memblock_free_pages()). It partitions these pages into 3
> categories: pages available to the buddy allocator, shadow pages and
> origin pages. This partitioning is static.
>
> If new pages appear after kmsan_init_runtime(), it is considered
> an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as
> incompatible with KMSAN.

In the future we could probably collect the deferred pages as well,
but it's okay to disable KMSAN for now.

> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 02/32] kmsan: Make the tests compatible with kmsan.panic=1

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> It's useful to have both tests and kmsan.panic=1 during development,
> but right now the warnings, that the tests cause, lead to kernel
> panics.
>
> Temporarily set kmsan.panic=0 for the duration of the KMSAN testing.

Nice!

> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 20/32] s390: Turn off KMSAN for boot, vdso and purgatory

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> All other sanitizers are disabled for these components as well.
>
> Reviewed-by: Alexander Gordeev 
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 
(see a nit below)

> ---
>  arch/s390/boot/Makefile  | 1 +
>  arch/s390/kernel/vdso32/Makefile | 1 +
>  arch/s390/kernel/vdso64/Makefile | 1 +
>  arch/s390/purgatory/Makefile | 1 +
>  4 files changed, 4 insertions(+)
>
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index c7c81e5f9218..5a05c927f703 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -8,6 +8,7 @@ GCOV_PROFILE := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>  KCSAN_SANITIZE := n
> +KMSAN_SANITIZE := n

Nit: I think having even a one-line comment before this block
(something similar to
https://elixir.bootlin.com/linux/latest/source/arch/x86/boot/Makefile#L12)
will make it more clear.

But given that the comment wasn't there before, leaving this up to you.



Re: [PATCH 12/32] kmsan: Allow disabling KMSAN checks for the current task

2023-11-16 Thread Ilya Leoshkevich
On Thu, 2023-11-16 at 09:56 +0100, Alexander Potapenko wrote:
> On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich 
> wrote:
> > 
> > Like for KASAN, it's useful to temporarily disable KMSAN checks
> > around,
> > e.g., redzone accesses.
> 
> This example is incorrect, because KMSAN does not have redzones.
> You are calling these functions from "mm: slub: Let KMSAN access
> metadata", which mentiones redzones in kfree(), but the description
> is
> still somewhat unclear.
> Can you provide more insight about what is going on? Maybe we can fix
> those accesses instead of disabling KMSAN?

It's about SLUB redzones, which appear when compiling with
CONFIG_DEBUG_SLAB.

I think that from KMSAN's point of view they should be considered
poisoned, but then the question is what to do with functions that check
them. I noticed that there was special handling for KASAN there
already, so I figured that the best solution would be to do the same
thing for KMSAN.



Re: [PATCH 11/32] kmsan: Export panic_on_kmsan

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> When building the kmsan test as a module, modpost fails with the
> following error message:
>
> ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined!
>
> Export panic_on_kmsan in order to improve the KMSAN usability for
> modules.
>
> Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Alexander Potapenko 



Re: [PATCH 30/32] s390/unwind: Disable KMSAN checks

2023-11-16 Thread Alexander Potapenko
On Thu, Nov 16, 2023 at 10:04 AM Alexander Potapenko  wrote:
>
> On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich  wrote:
> >
> > The unwind code can read uninitialized frames. Furthermore, even in
> > the good case, KMSAN does not emit shadow for backchains. Therefore
> > disable it for the unwinding functions.
> >
> > Signed-off-by: Ilya Leoshkevich 
> > ---
> >  arch/s390/kernel/unwind_bc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
> > index 0ece156fdd7c..7ecaab24783f 100644
> > --- a/arch/s390/kernel/unwind_bc.c
> > +++ b/arch/s390/kernel/unwind_bc.c
> > @@ -49,6 +49,7 @@ static inline bool is_final_pt_regs(struct unwind_state 
> > *state,
> >READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE;
> >  }
> >
> > +__no_kmsan_checks
>
> Please add some comments to the source file to back this annotation,
> so that the intent is not lost in git history.

Apart from that,

Reviewed-by: Alexander Potapenko 


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg



Re: [PATCH 30/32] s390/unwind: Disable KMSAN checks

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich  wrote:
>
> The unwind code can read uninitialized frames. Furthermore, even in
> the good case, KMSAN does not emit shadow for backchains. Therefore
> disable it for the unwinding functions.
>
> Signed-off-by: Ilya Leoshkevich 
> ---
>  arch/s390/kernel/unwind_bc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c
> index 0ece156fdd7c..7ecaab24783f 100644
> --- a/arch/s390/kernel/unwind_bc.c
> +++ b/arch/s390/kernel/unwind_bc.c
> @@ -49,6 +49,7 @@ static inline bool is_final_pt_regs(struct unwind_state 
> *state,
>READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE;
>  }
>
> +__no_kmsan_checks

Please add some comments to the source file to back this annotation,
so that the intent is not lost in git history.



Re: [PATCH 12/32] kmsan: Allow disabling KMSAN checks for the current task

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Like for KASAN, it's useful to temporarily disable KMSAN checks around,
> e.g., redzone accesses.

This example is incorrect, because KMSAN does not have redzones.
You are calling these functions from "mm: slub: Let KMSAN access
metadata", which mentiones redzones in kfree(), but the description is
still somewhat unclear.
Can you provide more insight about what is going on? Maybe we can fix
those accesses instead of disabling KMSAN?



Re: [PATCH 00/32] kmsan: Enable on s390

2023-11-16 Thread Alexander Potapenko
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich  wrote:
>
> Hi,
>
> This series provides the minimal support for Kernel Memory Sanitizer on
> s390. Kernel Memory Sanitizer is clang-only instrumentation for finding
> accesses to uninitialized memory. The clang support for s390 has already
> been merged [1].
>
> With this series, I can successfully boot s390 defconfig and
> debug_defconfig with kmsan.panic=1. The tool found one real
> s390-specific bug (fixed in master).
>
> Best regards,
> Ilya

Hi Ilya,

This is really impressive!
Can you please share some instructions on how to run KMSAN in QEMU?
I've never touched s390, but I'm assuming it should be possible?