Re: [RFC PATCH v1 2/2] vsock/test: SO_RCVLOWAT + deferred credit update test
On 15.11.2023 14:11, Stefano Garzarella wrote: > On Wed, Nov 08, 2023 at 10:20:04AM +0300, Arseniy Krasnov wrote: >> This adds test which checks, that updating SO_RCVLOWAT value also sends > > You can avoid "This adds", and write just "Add test ...". > > See > https://docs.kernel.org/process/submitting-patches.html#describe-your-changes > > Describe your changes in imperative mood, e.g. "make xyzzy do frotz" > instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy > to do frotz", as if you are giving orders to the codebase to change > its behaviour. > > Also in the other patch. > >> credit update message. Otherwise mutual hungup may happen when receiver >> didn't send credit update and then calls 'poll()' with non default >> SO_RCVLOWAT value (e.g. waiting enough bytes to read), while sender >> waits for free space at receiver's side. >> >> Signed-off-by: Arseniy Krasnov >> --- >> tools/testing/vsock/vsock_test.c | 131 +++ >> 1 file changed, 131 insertions(+) >> >> diff --git a/tools/testing/vsock/vsock_test.c >> b/tools/testing/vsock/vsock_test.c >> index c1f7bc9abd22..c71b3875fd16 100644 >> --- a/tools/testing/vsock/vsock_test.c >> +++ b/tools/testing/vsock/vsock_test.c >> @@ -1180,6 +1180,132 @@ static void test_stream_shutrd_server(const struct >> test_opts *opts) >> close(fd); >> } >> >> +#define RCVLOWAT_CREDIT_UPD_BUF_SIZE (1024 * 128) >> +#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64) > > What about adding a comment like the one in the cover letter about > dependency with kernel values? > > Please add it also in the commit description. > > I'm thinking if we should move all the defines that depends on the > kernel in some special header. IIUC it will be new header file in tools/testing/vsock, which includes such defines. At this moment in will contain only VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. Idea is that such defines are not supposed to use by user (so do not move it to uapi headers), but needed by tests to check kernel behaviour. Please correct me if i'm wrong. Thanks, Arseniy > >> + >> +static void test_stream_rcvlowat_def_cred_upd_client(const struct test_opts >> *opts) >> +{ >> + size_t buf_size; >> + void *buf; >> + int fd; >> + >> + fd = vsock_stream_connect(opts->peer_cid, 1234); >> + if (fd < 0) { >> + perror("connect"); >> + exit(EXIT_FAILURE); >> + } >> + >> + /* Send 1 byte more than peer's buffer size. */ >> + buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE + 1; >> + >> + buf = malloc(buf_size); >> + if (!buf) { >> + perror("malloc"); >> + exit(EXIT_FAILURE); >> + } >> + >> + /* Wait until peer sets needed buffer size. */ >> + control_expectln("SRVREADY"); >> + >> + if (send(fd, buf, buf_size, 0) != buf_size) { >> + perror("send failed"); >> + exit(EXIT_FAILURE); >> + } >> + >> + free(buf); >> + close(fd); >> +} >> + >> +static void test_stream_rcvlowat_def_cred_upd_server(const struct test_opts >> *opts) >> +{ >> + size_t recv_buf_size; >> + struct pollfd fds; >> + size_t buf_size; >> + void *buf; >> + int fd; >> + >> + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); >> + if (fd < 0) { >> + perror("accept"); >> + exit(EXIT_FAILURE); >> + } >> + >> + buf_size = RCVLOWAT_CREDIT_UPD_BUF_SIZE; >> + >> + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE, >> + _size, sizeof(buf_size))) { >> + perror("setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)"); >> + exit(EXIT_FAILURE); >> + } >> + >> + buf = malloc(buf_size); >> + if (!buf) { >> + perror("malloc"); >> + exit(EXIT_FAILURE); >> + } >> + >> + control_writeln("SRVREADY"); >> + >> + /* Wait until there will be 128KB of data in rx queue. */ >> + while (1) { >> + ssize_t res; >> + >> + res = recv(fd, buf, buf_size, MSG_PEEK); >> + if (res == buf_size) >> + break; >> + >> + if (res <= 0) { >> + fprintf(stderr, "unexpected 'recv()' return: %zi\n", res); >> + exit(EXIT_FAILURE); >> + } >> + } >> + >> + /* There is 128KB of data in the socket's rx queue, >> + * dequeue first 64KB, credit update is not sent. >> + */ >> + recv_buf_size = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; >> + recv_buf(fd, buf, recv_buf_size, 0, recv_buf_size); >> + recv_buf_size++; >> + >> + /* Updating SO_RCVLOWAT will send credit update. */ >> + if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT, >> + _buf_size, sizeof(recv_buf_size))) { >> + perror("setsockopt(SO_RCVLOWAT)"); >> + exit(EXIT_FAILURE); >> + } >> + >> + memset(, 0, sizeof(fds)); >> + fds.fd = fd; >> + fds.events = POLLIN | POLLRDNORM | POLLERR | >> + POLLRDHUP | POLLHUP; >> + >> + /* This 'poll()' will return once we receive last byte >> + * sent by client. >> + */ >> + if (poll(, 1, -1)
Re: [RFC PATCH v1 1/2] virtio/vsock: send credit update during setting SO_RCVLOWAT
On 15.11.2023 14:08, Stefano Garzarella wrote: > On Wed, Nov 08, 2023 at 10:20:03AM +0300, Arseniy Krasnov wrote: >> This adds sending credit update message when SO_RCVLOWAT is updated and >> it is bigger than number of bytes in rx queue. It is needed, because >> 'poll()' will wait until number of bytes in rx queue will be not smaller >> than SO_RCVLOWAT, so kick sender to send more data. Otherwise mutual >> hungup for tx/rx is possible: sender waits for free space and receiver >> is waiting data in 'poll()'. >> >> Signed-off-by: Arseniy Krasnov >> --- >> drivers/vhost/vsock.c | 2 ++ >> include/linux/virtio_vsock.h | 1 + >> net/vmw_vsock/virtio_transport.c | 2 ++ >> net/vmw_vsock/virtio_transport_common.c | 31 + >> net/vmw_vsock/vsock_loopback.c | 2 ++ >> 5 files changed, 38 insertions(+) >> >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c >> index f75731396b7e..ecfa5c11f5ee 100644 >> --- a/drivers/vhost/vsock.c >> +++ b/drivers/vhost/vsock.c >> @@ -451,6 +451,8 @@ static struct virtio_transport vhost_transport = { >> .notify_buffer_size = virtio_transport_notify_buffer_size, >> >> .read_skb = virtio_transport_read_skb, >> + >> + .set_rcvlowat = virtio_transport_set_rcvlowat >> }, >> >> .send_pkt = vhost_transport_send_pkt, >> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h >> index ebb3ce63d64d..97dc1bebc69c 100644 >> --- a/include/linux/virtio_vsock.h >> +++ b/include/linux/virtio_vsock.h >> @@ -256,4 +256,5 @@ void virtio_transport_put_credit(struct >> virtio_vsock_sock *vvs, u32 credit); >> void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); >> int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list); >> int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t >> read_actor); >> +int virtio_transport_set_rcvlowat(struct vsock_sock *vsk, int val); >> #endif /* _LINUX_VIRTIO_VSOCK_H */ >> diff --git a/net/vmw_vsock/virtio_transport.c >> b/net/vmw_vsock/virtio_transport.c >> index af5bab1acee1..cf3431189d0c 100644 >> --- a/net/vmw_vsock/virtio_transport.c >> +++ b/net/vmw_vsock/virtio_transport.c >> @@ -539,6 +539,8 @@ static struct virtio_transport virtio_transport = { >> .notify_buffer_size = virtio_transport_notify_buffer_size, >> >> .read_skb = virtio_transport_read_skb, >> + >> + .set_rcvlowat = virtio_transport_set_rcvlowat >> }, >> >> .send_pkt = virtio_transport_send_pkt, >> diff --git a/net/vmw_vsock/virtio_transport_common.c >> b/net/vmw_vsock/virtio_transport_common.c >> index e22c81435ef7..88a58163046e 100644 >> --- a/net/vmw_vsock/virtio_transport_common.c >> +++ b/net/vmw_vsock/virtio_transport_common.c >> @@ -1676,6 +1676,37 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, >> skb_read_actor_t recv_acto >> } >> EXPORT_SYMBOL_GPL(virtio_transport_read_skb); >> >> +int virtio_transport_set_rcvlowat(struct vsock_sock *vsk, int val) >> +{ >> + struct virtio_vsock_sock *vvs = vsk->trans; >> + bool send_update = false; > > I'd declare this not initialized. > >> + >> + spin_lock_bh(>rx_lock); >> + >> + /* If number of available bytes is less than new >> + * SO_RCVLOWAT value, kick sender to send more >> + * data, because sender may sleep in its 'send()' >> + * syscall waiting for enough space at our side. >> + */ >> + if (vvs->rx_bytes < val) >> + send_update = true; > > Then here just: > send_update = vvs->rx_bytes < val; > >> + >> + spin_unlock_bh(>rx_lock); >> + >> + if (send_update) { >> + int err; >> + >> + err = virtio_transport_send_credit_update(vsk); >> + if (err < 0) >> + return err; >> + } >> + >> + WRITE_ONCE(sk_vsock(vsk)->sk_rcvlowat, val ? : 1); > > Not in this patch, but what about doing this in vsock_set_rcvlowat() in > af_vsock.c? > > I mean avoid to return if `transport->set_rcvlowat(vsk, val)` is > successfully, so set sk_rcvlowat in a single point. Yes, we can do it, I'll include new patch as 0001 in v2, don't remember why it wasn't implemented in this way before. Thanks, Arseniy > > The rest LGTM! > > Stefano > >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(virtio_transport_set_rcvlowat); >> + >> MODULE_LICENSE("GPL v2"); >> MODULE_AUTHOR("Asias He"); >> MODULE_DESCRIPTION("common code for virtio vsock"); >> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c >> index 048640167411..388c157f6633 100644 >> --- a/net/vmw_vsock/vsock_loopback.c >> +++ b/net/vmw_vsock/vsock_loopback.c >> @@ -98,6 +98,8 @@ static struct virtio_transport loopback_transport = { >> .notify_buffer_size = virtio_transport_notify_buffer_size, >> >> .read_skb = virtio_transport_read_skb, >> + >> + .set_rcvlowat = virtio_transport_set_rcvlowat >> }, >> >> .send_pkt =
Re: [PATCH 27/32] s390/string: Add KMSAN support
Hi Ilya, kernel test robot noticed the following build errors: [auto build test ERROR on s390/features] [also build test ERROR on akpm-mm/mm-everything linus/master vbabka-slab/for-next v6.7-rc1 next-20231116] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ilya-Leoshkevich/ftrace-Unpoison-ftrace_regs-in-ftrace_ops_list_func/20231116-045608 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features patch link: https://lore.kernel.org/r/20231115203401.2495875-28-iii%40linux.ibm.com patch subject: [PATCH 27/32] s390/string: Add KMSAN support config: s390-debug_defconfig (https://download.01.org/0day-ci/archive/20231117/202311170550.bsbo44ix-...@intel.com/config) compiler: s390-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231117/202311170550.bsbo44ix-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202311170550.bsbo44ix-...@intel.com/ All errors (new ones prefixed by >>): s390-linux-ld: arch/s390/mm/vmem.o: in function `crst_table_init': >> arch/s390/include/asm/pgalloc.h:33:(.text+0x1ba): undefined reference to >> `memset64' s390-linux-ld: arch/s390/mm/vmem.o: in function `vmem_pte_alloc': >> arch/s390/mm/vmem.c:68:(.ref.text+0x1ec): undefined reference to `memset64' s390-linux-ld: arch/s390/mm/pgalloc.o: in function `base_pgt_alloc': >> arch/s390/mm/pgalloc.c:241:(.text+0x184): undefined reference to `memset64' s390-linux-ld: arch/s390/mm/pgalloc.o: in function `crst_table_init': arch/s390/include/asm/pgalloc.h:33:(.text+0x3e8): undefined reference to `memset64' >> s390-linux-ld: arch/s390/include/asm/pgalloc.h:33:(.text+0x568): undefined >> reference to `memset64' s390-linux-ld: arch/s390/mm/pgalloc.o:arch/s390/include/asm/pgalloc.h:33: more undefined references to `memset64' follow s390-linux-ld: lib/test_string.o: in function `memset16_selftest': >> lib/test_string.c:19:(.init.text+0x94): undefined reference to `memset16' s390-linux-ld: lib/test_string.o: in function `memset32_selftest': >> lib/test_string.c:55:(.init.text+0x234): undefined reference to `memset32' s390-linux-ld: lib/test_string.o: in function `memset64_selftest': >> lib/test_string.c:91:(.init.text+0x3c2): undefined reference to `memset64' s390-linux-ld: drivers/video/fbdev/core/fbcon.o: in function `scr_memsetw': >> include/linux/vt_buffer.h:36:(.text+0x30f6): undefined reference to >> `memset16' >> s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x320a): undefined >> reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x32c4): undefined reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x33b8): undefined reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x4f60): undefined reference to `memset16' s390-linux-ld: drivers/video/fbdev/core/fbcon.o:include/linux/vt_buffer.h:36: more undefined references to `memset16' follow s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_copy_area': >> drivers/tty/vt/vt.c:464:(.text+0x107e): undefined reference to `memset32' >> s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1104): undefined reference >> to `memset32' s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1118): undefined reference to `memset32' s390-linux-ld: drivers/tty/vt/vt.c:471:(.text+0x1140): undefined reference to `memset32' s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_insert': drivers/tty/vt/vt.c:374:(.text+0x13a4): undefined reference to `memset32' s390-linux-ld: drivers/tty/vt/vt.o:drivers/tty/vt/vt.c:385: more undefined references to `memset32' follow s390-linux-ld: drivers/tty/vt/vt.o: in function `scr_memsetw': include/linux/vt_buffer.h:36:(.text+0x2844): undefined reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x2932): undefined reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x2fe8): undefined reference to `memset16' s390-linux-ld: include/linux/vt_buffer.h:36:(.text+0x319c): undefined reference to `memset16' s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_clear_line': drivers/tty/vt/vt.c:393:(.text+0x3f78): undefined reference to `memset32' s390-linux-ld: drivers/tty/vt/vt.o: in function `vc_uniscr_clear_lines': drivers/tty/vt/vt.c:401:(.text+0x3fb8): undefined reference to `memset32' s390-linux-ld: drivers/tty/vt/vt.c:401:(.text+0x3fe2): undefined reference to `memset32'
EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
Hi, when testing the EEVDF scheduler we stumbled upon a performance regression in a uperf scenario and would like to kindly ask for feedback on whether we are going into the right direction with our analysis so far. The base scenario are two KVM guests running on an s390 LPAR. One guest hosts the uperf server, one the uperf client. With EEVDF we observe a regression of ~50% for a strburst test. For a more detailed description of the setup see the section TEST SUMMARY at the bottom. Bisecting led us to the following commit which appears to introduce the regression: 86bfbb7ce4f6 sched/fair: Add lag based placement We then compared the last good commit we identified with a recent level of the devel branch. The issue still persists on 6.7 rc1 although there is some improvement (down from 62% regression to 49%) All analysis described further are based on a 6.6 rc7 kernel. We sampled perf data to get an idea on what is going wrong and ended up seeing an dramatic increase in the maximum wait times from 3ms up to 366ms. See section WAIT DELAYS below for more details. We then collected tracing data to get a better insight into what is going on. The trace excerpt in section TRACE EXCERPT shows one example (of multiple per test run) of the problematic scenario where a kworker(pid=6525) has to wait for 39,718 ms. Short summary: The mentioned kworker has been scheduled to CPU 14 before the tracing was enabled. A vhost process is migrated onto CPU 14. The vruntimes of kworker and vhost differ significantly (86642125805 vs 4242563284 -> factor 20) The vhost process wants to wake up the kworker, therefore the kworker is placed onto the runqueue again and set to runnable. The vhost process continues to execute, waking up other vhost processes on other CPUs. So far this behavior is not different to what we see on pre-EEVDF kernels. On timestamp 576.162767, the vhost process triggers the last wake up of another vhost on another CPU. Until timestamp 576.171155, we see no other activity. Now, the vhost process ends its time slice. Then, vhost gets re-assigned new time slices 4 times and gets then migrated off to CPU 15. This does not occur with older kernels. The kworker has to wait for the migration to happen in order to be able to execute again. This is due to the fact, that the vruntime of the kworker is significantly larger than the one of vhost. We observed the large difference in vruntime between kworker and vhost in the same magnitude on a kernel built based on the parent of the commit mentioned above. With EEVDF, the kworker is doomed to wait until the vhost either catches up on vruntime (which would take 86 seconds) or the vhost is migrated off of the CPU. We found some options which sound plausible but we are not sure if they are valid or not: 1. The wake up path has a dependency on the vruntime metrics that now delays the execution of the kworker. 2. The previous commit af4cf40470c2 (sched/fair: Add cfs_rq::avg_vruntime) which updates the way cfs_rq->min_vruntime and cfs_rq->avg_runtime are set might have introduced an issue which is uncovered with the commit mentioned above. 3. An assumption in the vhost code which causes vhost to rely on being scheduled off in time to allow the kworker to proceed. We also stumbled upon the following mailing thread: https://lore.kernel.org/lkml/ZORaUsd+So+tnyMV@chenyu5-mobl2/ That conversation, and the patches derived from it lead to the assumption that the wake up path might be adjustable in a way that this case in particular can be addressed. At the same time, the vast difference in vruntimes is concerning since, at least for some time frame, both processes are on the runqueue. We would be glad to hear some feedback on which paths to pursue and which might just be a dead end in the first place. TRACE EXCERPT The sched_place trace event was added to the end of the place_entity function and outputs: sev -> sched_entity vruntime sed -> sched_entity deadline sel -> sched_entity vlag avg -> cfs_rq avg_vruntime min -> cfs_rq min_vruntime cpu -> cpu of cfs_rq nr -> cfs_rq nr_running --- CPU 3/KVM-2950[014] d 576.161432: sched_migrate_task: comm=vhost-2920 pid=2941 prio=120 orig_cpu=15 dest_cpu=14 --> migrates task from cpu 15 to 14 CPU 3/KVM-2950[014] d 576.161433: sched_place: comm=vhost-2920 pid=2941 sev=4242563284 sed=4245563284 sel=0 avg=4242563284 min=4242563284 cpu=14 nr=0 --> places vhost 2920 on CPU 14 with vruntime 4242563284 CPU 3/KVM-2950[014] d 576.161433: sched_place: comm= pid=0 sev=16329848593 sed=16334604010 sel=0 avg=16329848593 min=16329848593 cpu=14 nr=0 CPU 3/KVM-2950[014] d 576.161433: sched_place: comm= pid=0 sev=42560661157 sed=42627443765 sel=0 avg=42560661157 min=42560661157 cpu=14 nr=0 CPU 3/KVM-2950[014] d 576.161434: sched_place: comm= pid=0 sev=53846627372
Re: [PATCH 2/3] modpost: Extended modversion support
On Wed, Nov 15, 2023 at 06:50:10PM +, Matthew Maurer wrote: > Adds a new format for modversions which stores each field in a separate > elf section. The "why" is critical and not mentioned. And I'd like to also see documented this with foresight, if Rust needed could this be used in the future for other things? Also please include folks CC'd in *one* patch to *all* patches as otherwise we have no context. > This initially adds support for variable length names, but > could later be used to add additional fields to modversions in a > backwards compatible way if needed. > > Adding support for variable length names makes it possible to enable > MODVERSIONS and RUST at the same time. > > Signed-off-by: Matthew Maurer > --- > arch/powerpc/kernel/module_64.c | 24 +- Why was only powerpc modified? If the commit log explained this it would make it easier for review. > diff --git a/kernel/module/internal.h b/kernel/module/internal.h > index c8b7b4dcf782..0c188c96a045 100644 > --- a/kernel/module/internal.h > +++ b/kernel/module/internal.h > @@ -80,7 +80,7 @@ struct load_info { > unsigned int used_pages; > #endif > struct { > - unsigned int sym, str, mod, vers, info, pcpu; > + unsigned int sym, str, mod, vers, info, pcpu, vers_ext_crc, > vers_ext_name; We might as well modify this in a preliminary patch to add each new unsinged int in a new line, so that it is easier to blame when each new entry gets added. It should not grow the size of the struct at all but it would make futur extensions easier to review what is new and git blame easier to spot when something was added. Although we don't use this extensively today this can easily grow for convenience and making code easier to read. > diff --git a/kernel/module/version.c b/kernel/module/version.c > index 53f43ac5a73e..93d97dad8c77 100644 > --- a/kernel/module/version.c > +++ b/kernel/module/version.c > @@ -19,11 +19,28 @@ int check_version(const struct load_info *info, > unsigned int versindex = info->index.vers; > unsigned int i, num_versions; > struct modversion_info *versions; > + struct modversion_info_ext version_ext; > > /* Exporting module didn't supply crcs? OK, we're already tainted. */ > if (!crc) > return 1; > > + /* If we have extended version info, rely on it */ > + if (modversion_ext_start(info, _ext) >= 0) { There are two things we need to do to make processing modules easier: 1) ELF validation 2) Once checked then process the information We used to have this split up but also had a few places which did both 1) and 2) together. This was wrong and so I want to keep things tidy and ensure we do things which validate the ELF separate. To that end please put the checks to validate the ELF first so that we report to users with a proper error/debug check in case the ELF is wrong, this enables futher debug checks for that to be done instead of confusing users who end up scratching their heads why something failed. So please split up the ELF validation check and put that into elf_validity_cache_copy() which runs *earlier* than this. Then *if* if has this, you just process it. Please take care to be very pedantic in the elf_validity_cache_copy() and extend the checks you have for validation in modversion_ext_start() and bring them to elf_validity_cache_copy() with perhaps *more* stuff which does any insane checks to verify it is 100% correct. > + do { > + if (strncmp(version_ext.name.value, symname, > + version_ext.name.end - > version_ext.name.value) != 0) > + continue; > + > + if (*version_ext.crc.value == *crc) > + return 1; > + pr_debug("Found checksum %X vs module %X\n", > + *crc, *version_ext.crc.value); > + goto bad_version; > + } while (modversion_ext_advance(_ext) == 0); Can you do a for_each_foo()) type loop here instead after validation? Because the validation would ensure your loop is bounded then. Look at for_each_mod_mem_type() for inspiration. > + goto broken_toolchain; The broken toolchain thing would then be an issue reported in the ELF validation. > @@ -87,6 +105,65 @@ int same_magic(const char *amagic, const char *bmagic, > return strcmp(amagic, bmagic) == 0; > } > > +#define MODVERSION_FIELD_START(sec, field) \ > + field.value = (typeof(field.value))sec.sh_addr; \ > + field.end = field.value + sec.sh_size > + > +ssize_t modversion_ext_start(const struct load_info *info, > + struct modversion_info_ext *start) > +{ > + unsigned int crc_idx = info->index.vers_ext_crc; > + unsigned int name_idx = info->index.vers_ext_name; > + Elf_Shdr *sechdrs = info->sechdrs; > + > + // Both of these fields are needed for this to be
Re: [PATCH 28/32] s390/traps: Unpoison the kernel_stack_overflow()'s pt_regs
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich wrote: > > This is normally done by the generic entry code, but the > kernel_stack_overflow() flow bypasses it. > > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko > --- > arch/s390/kernel/traps.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c > index 1d2aa448d103..dd7362806dbb 100644 > --- a/arch/s390/kernel/traps.c > +++ b/arch/s390/kernel/traps.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -260,6 +261,7 @@ static void monitor_event_exception(struct pt_regs *regs) > > void kernel_stack_overflow(struct pt_regs *regs) > { > + kmsan_unpoison_entry_regs(regs); I suggest adding a comment here.
Re: [PATCH 13/32] kmsan: Support SLAB_POISON
On Thu, 2023-11-16 at 15:55 +0100, Alexander Potapenko wrote: > On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich > wrote: > > > > Avoid false KMSAN negatives with SLUB_DEBUG by allowing > > kmsan_slab_free() to poison the freed memory, and by preventing > > init_object() from unpoisoning new allocations. > > > > Signed-off-by: Ilya Leoshkevich > > --- > > mm/kmsan/hooks.c | 2 +- > > mm/slub.c | 3 ++- > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c > > index 7b5814412e9f..7a30274b893c 100644 > > --- a/mm/kmsan/hooks.c > > +++ b/mm/kmsan/hooks.c > > @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void > > *object) > > return; > > > > /* RCU slabs could be legally used after free within the > > RCU period */ > > - if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | > > SLAB_POISON))) > > + if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) > > return; > > /* > > * If there's a constructor, freed memory must remain in > > the same state > > diff --git a/mm/slub.c b/mm/slub.c > > index 63d281dfacdb..8d9aa4d7cb7e 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1024,7 +1024,8 @@ static __printf(3, 4) void slab_err(struct > > kmem_cache *s, struct slab *slab, > > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > > } > > > > -static void init_object(struct kmem_cache *s, void *object, u8 > > val) > > +__no_sanitize_memory static void > > __no_sanitize_memory should be used with great care, because it drops > all instrumentation from the function, and any shadow writes will be > lost. > Won't it be better to add kmsan_poison() to init_object() if you want > it to stay uninitialized? I wanted to avoid a ping-pong here, in which we already have properly poisoned memory, then memset() incorrectly unpoisons it, and then we undo the damage. My first attempt involved using __memset() instead, but this resulted in worse assembly code. I wish there were something like memset_noinstr(). Right now init_object() doesn't seem to be doing anything besides these memset()s, but this can of course change in the future. So I don't mind using kmsan_poison() instead of __no_sanitize_memory here too much, since it results in better maintainability.
Re: [PATCH 26/32] s390/mm: Define KMSAN metadata for vmalloc and modules
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich wrote: > > The pages for the KMSAN metadata associated with most kernel mappings > are taken from memblock by the common code. However, vmalloc and module > metadata needs to be defined by the architectures. > > Be a little bit more careful than x86: allocate exactly MODULES_LEN > for the module shadow and origins, and then take 2/3 of vmalloc for > the vmalloc shadow and origins. This ensures that users passing small > vmalloc= values on the command line do not cause module metadata > collisions. > > Signed-off-by: Ilya Leoshkevich > --- > arch/s390/boot/startup.c| 8 > arch/s390/include/asm/pgtable.h | 10 ++ > 2 files changed, 18 insertions(+) > > diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c > index 8104e0e3d188..297c1062372a 100644 > --- a/arch/s390/boot/startup.c > +++ b/arch/s390/boot/startup.c > @@ -253,9 +253,17 @@ static unsigned long setup_kernel_memory_layout(void) > MODULES_END = round_down(__abs_lowcore, _SEGMENT_SIZE); > MODULES_VADDR = MODULES_END - MODULES_LEN; > VMALLOC_END = MODULES_VADDR; > +#ifdef CONFIG_KMSAN > + VMALLOC_END -= MODULES_LEN * 2; > +#endif > > /* allow vmalloc area to occupy up to about 1/2 of the rest virtual > space left */ > vmalloc_size = min(vmalloc_size, round_down(VMALLOC_END / 2, > _REGION3_SIZE)); > +#ifdef CONFIG_KMSAN > + /* take 2/3 of vmalloc area for KMSAN shadow and origins */ > + vmalloc_size = round_down(vmalloc_size / 3, PAGE_SIZE); Is it okay that vmalloc_size is only aligned on PAGE_SIZE? E.g. above the alignment is _REGION3_SIZE.
Re: [PATCH v2 11/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable WiFi
On 11/13/23 09:56, Luca Weiss wrote: Now that the WPSS remoteproc is enabled, enable wifi so we can use it. Signed-off-by: Luca Weiss --- Reviewed-by: Konrad Dybcio Konrad
Re: [PATCH v2 10/11] arm64: dts: qcom: qcm6490-fairphone-fp5: Enable various remoteprocs
On 11/13/23 09:56, Luca Weiss wrote: Enable the ADSP, CDSP, MPSS and WPSS that are found on the SoC. Signed-off-by: Luca Weiss --- Reviewed-by: Konrad Dybcio Konrad
Re: [PATCH v2 09/11] arm64: dts: qcom: sc7280: Add CDSP node
On 11/13/23 09:56, Luca Weiss wrote: Add the node for the ADSP found on the SC7280 SoC, using standard Qualcomm firmware. Remove the reserved-memory node from sc7280-chrome-common since CDSP is currently not used there. Signed-off-by: Luca Weiss --- Acked-by: Konrad Dybcio [...] + interconnects = <_noc MASTER_CDSP_PROC 0 _virt SLAVE_EBI1 0>; QCOM_ICC_TAG_ALWAYS Konrad
Re: [PATCH v2 08/11] arm64: dts: qcom: sc7280: Add ADSP node
On 11/13/23 09:56, Luca Weiss wrote: Add the node for the ADSP found on the SC7280 SoC, using standard Qualcomm firmware. Signed-off-by: Luca Weiss --- Acked-by: Konrad Dybcio Konrad
Re: [PATCH 13/32] kmsan: Support SLAB_POISON
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Avoid false KMSAN negatives with SLUB_DEBUG by allowing > kmsan_slab_free() to poison the freed memory, and by preventing > init_object() from unpoisoning new allocations. > > Signed-off-by: Ilya Leoshkevich > --- > mm/kmsan/hooks.c | 2 +- > mm/slub.c| 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c > index 7b5814412e9f..7a30274b893c 100644 > --- a/mm/kmsan/hooks.c > +++ b/mm/kmsan/hooks.c > @@ -76,7 +76,7 @@ void kmsan_slab_free(struct kmem_cache *s, void *object) > return; > > /* RCU slabs could be legally used after free within the RCU period */ > - if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON))) > + if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) > return; > /* > * If there's a constructor, freed memory must remain in the same > state > diff --git a/mm/slub.c b/mm/slub.c > index 63d281dfacdb..8d9aa4d7cb7e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1024,7 +1024,8 @@ static __printf(3, 4) void slab_err(struct kmem_cache > *s, struct slab *slab, > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > } > > -static void init_object(struct kmem_cache *s, void *object, u8 val) > +__no_sanitize_memory static void __no_sanitize_memory should be used with great care, because it drops all instrumentation from the function, and any shadow writes will be lost. Won't it be better to add kmsan_poison() to init_object() if you want it to stay uninitialized?
Re: [PATCH v2 06/11] remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS
On 11/13/23 09:56, Luca Weiss wrote: Add support for the ADSP, CDSP and WPSS remoteprocs found on the SC7280 SoC using the q6v5-pas driver. This driver can be used on regular LA ("Linux Android") based releases, however the SC7280 ChromeOS devices need different driver support due to firmware differences. Signed-off-by: Luca Weiss --- Reviewed-by: Konrad Dybcio Konrad
Re: [PATCH v2 03/11] arm64: dts: qcom: sc7280: Rename reserved-memory nodes
On 11/13/23 09:56, Luca Weiss wrote: It was clarified a while ago that reserved-memory nodes shouldn't be called memory@ but should have a descriptive name. Update sc7280.dtsi to follow that. Signed-off-by: Luca Weiss --- Reviewed-by: Konrad Dybcio Konrad
Re: [GIT PULL] vhost,virtio,vdpa,firmware: bugfixes
The pull request you sent on Tue, 14 Nov 2023 15:24:36 -0500: > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/372bed5fbb87314abf410c3916e51578cd382cd1 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
Following kernel crash noticed while running selftests: ftrace on arm64 Juno-r2 device running stable-rc linux-6.6.y kernel. This kernel crash is hard to reproduce. Reported-by: Linux Kernel Functional Testing kselftest: Running tests in ftrace TAP version 13 1..1 # timeout set to 0 # selftests: ftrace: ftracetest-ktap # unlink: cannot unlink '/opt/kselftests/default-in-kernel/ftrace/logs/latest': No such file or directory # TAP version 13 # 1..130 # ok 1 Basic trace file check ... # ok 44 ftrace - test for function traceon/off triggers # ok 45 ftrace - test tracing error log support [ 617.613901] [ cut here ] [ 617.618870] list_del corruption. prev->next should be 000837184758, but was 000837184638. (prev=0008359f82e8) [ 617.629894] WARNING: CPU: 1 PID: 13512 at lib/list_debug.c:62 __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1)) [ 617.639496] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last unloaded: trace_printk] [ 617.655832] CPU: 1 PID: 13512 Comm: ls Not tainted 6.6.2-rc1 #1 [ 617.661765] Hardware name: ARM Juno development board (r2) (DT) [ 617.667692] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 617.674668] pc : __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1)) [ 617.680339] lr : __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1)) [ 617.686009] sp : 80008507b910 [ 617.689325] x29: 80008507b910 x28: 0003 x27: 0001 [ 617.696482] x26: 000834694788 x25: 000835a39c88 x24: [ 617.703637] x23: x22: 0008359f8248 x21: 000837184720 [ 617.710792] x20: 0008359f8248 x19: 000837184758 x18: 0010 [ 617.717948] x17: 20747562202c3835 x16: 3734383137333830 x15: 30302065 [ 617.725102] x14: 6220646c756f6873 x13: 2938653238663935 x12: 333830303066 [ 617.732257] x11: 663d766572702820 x10: 2e38333634383137 x9 : 800080145634 [ 617.739411] x8 : 80008507b638 x7 : 00017fe8 x6 : 00057fa8 [ 617.746566] x5 : 0fff x4 : 00097ed27d50 x3 : 8008fc5d3000 [ 617.753720] x2 : x1 : x0 : 0008010722c0 [ 617.760874] Call trace: [ 617.763320] __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1)) [ 617.768643] __dentry_kill (include/linux/list.h:215 (discriminator 1) fs/dcache.c:550 (discriminator 1) fs/dcache.c:603 (discriminator 1)) [ 617.772311] dput (fs/dcache.c:896) [ 617.775281] create_dentry (fs/tracefs/event_inode.c:404) [ 617.779040] dcache_dir_open_wrapper (fs/tracefs/event_inode.c:536) [ 617.783580] do_dentry_open (fs/open.c:929) [ 617.787424] vfs_open (fs/open.c:1064) [ 617.790572] path_openat (fs/namei.c:3640 fs/namei.c:3797) [ 617.794153] do_filp_open (fs/namei.c:3824) [ 617.797733] do_sys_openat2 (fs/open.c:1422) [ 617.801490] __arm64_sys_openat (fs/open.c:1448) [ 617.805508] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:56) [ 617.809267] el0_svc_common.constprop.0 (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:144 (discriminator 2)) [ 617.814069] do_el0_svc (arch/arm64/kernel/syscall.c:156) [ 617.817391] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:679) [ 617.820454] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697) [ 617.824820] el0t_64_sync (arch/arm64/kernel/entry.S:595) [ 617.828488] ---[ end trace ]--- [ 617.833214] [ cut here ] [ 617.837840] WARNING: CPU: 2 PID: 13523 at fs/dcache.c:365 dentry_free (fs/dcache.c:365 (discriminator 1)) [ 617.845173] Modules linked in: onboard_usb_hub hdlcd tda998x crct10dif_ce drm_dma_helper cec drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [last unloaded: trace_printk] [ 617.861503] CPU: 2 PID: 13523 Comm: rmdir Tainted: GW 6.6.2-rc1 #1 [ 617.869175] Hardware name: ARM Juno development board (r2) (DT) [ 617.875102] pstate: 4005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 617.882077] pc : dentry_free (fs/dcache.c:365 (discriminator 1)) [ 617.885835] lr : __dentry_kill (fs/dcache.c:623) [ 617.889937] sp : 800085123a30 [ 617.893252] x29: 800085123a30 x28: 0008371845a8 x27: 0008371845a8 [ 617.900409] x26: 000837184600 x25: 000837184638 x24: 0008359f82a0 [ 617.907565] x23: 000835a39d28 x22: 0008359f8248 x21: 000837184720 [ 617.914719] x20: 0008359f8248 x19: 0008371846c8 x18: [ 617.921874] x17: 00040044 x16: 00500074b5503510 x15: [ 617.929029] x14: x13: 0030 x12: 0101010101010101 [
Re: [PATCH 07/32] kmsan: Remove a useless assignment from kmsan_vmap_pages_range_noflush()
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > The value assigned to prot is immediately overwritten on the next line > with PAGE_KERNEL. The right hand side of the assignment has no > side-effects. > > Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") > Suggested-by: Alexander Gordeev > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko > --- > mm/kmsan/shadow.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c > index b9d05aff313e..2d57408c78ae 100644 > --- a/mm/kmsan/shadow.c > +++ b/mm/kmsan/shadow.c > @@ -243,7 +243,6 @@ int kmsan_vmap_pages_range_noflush(unsigned long start, > unsigned long end, > s_pages[i] = shadow_page_for(pages[i]); > o_pages[i] = origin_page_for(pages[i]); > } > - prot = __pgprot(pgprot_val(prot) | _PAGE_NX); > prot = PAGE_KERNEL; This bug dates back to 5.1-rc2, when KMSAN didn't exist upstream. The commit introducing vmap support already had it: https://github.com/google/kmsan/commit/3ff9d7c640d378485286e1a99d85984ae6901f23 I don't remember what exactly required the more relaxed PAGE_KERNEL mask though :)
[syzbot] [bpf?] [trace?] possible deadlock in sctp_err_lookup
Hello, syzbot found the following issue on: HEAD commit:f31817cbcf48 Add linux-next specific files for 20231116 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=11a32f9768 kernel config: https://syzkaller.appspot.com/x/.config?x=f59345f1d0a928c dashboard link: https://syzkaller.appspot.com/bug?extid=422ecd5adb35122711b7 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/987488cb251e/disk-f31817cb.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/6d4a82d8bd4b/vmlinux-f31817cb.xz kernel image: https://storage.googleapis.com/syzbot-assets/fc43dee9cb86/bzImage-f31817cb.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+422ecd5adb3512271...@syzkaller.appspotmail.com = WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 6.7.0-rc1-next-20231116-syzkaller #0 Not tainted - syz-executor.2/5088 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire: 888025d21bd8 (>siglock){+.+.}-{2:2}, at: __lock_task_sighand+0xc2/0x340 kernel/signal.c:1422 and this task is already holding: 88802dd927b0 (slock-AF_INET6){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] 88802dd927b0 (slock-AF_INET6){+.-.}-{2:2}, at: __tcp_close+0x4e6/0xfd0 net/ipv4/tcp.c:2843 which would create a new lock dependency: (slock-AF_INET6){+.-.}-{2:2} -> (>siglock){+.+.}-{2:2} but this new dependency connects a SOFTIRQ-irq-safe lock: (slock-AF_INET6){+.-.}-{2:2} ... which became SOFTIRQ-irq-safe at: lock_acquire kernel/locking/lockdep.c:5753 [inline] lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5718 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sctp_err_lookup+0x488/0xb50 net/sctp/input.c:523 sctp_v6_err+0x201/0x540 net/sctp/ipv6.c:175 icmpv6_notify+0x337/0x750 net/ipv6/icmp.c:867 icmpv6_rcv+0x882/0x19c0 net/ipv6/icmp.c:1013 ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438 ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643 process_backlog+0x101/0x6b0 net/core/dev.c:5971 __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6533 napi_poll net/core/dev.c:6602 [inline] net_rx_action+0x956/0xe90 net/core/dev.c:6735 __do_softirq+0x216/0x8d5 kernel/softirq.c:553 do_softirq kernel/softirq.c:454 [inline] do_softirq+0xaa/0xe0 kernel/softirq.c:441 __local_bh_enable_ip+0xfc/0x120 kernel/softirq.c:381 local_bh_enable include/linux/bottom_half.h:33 [inline] icmp6_send+0x7d5/0x2b10 net/ipv6/icmp.c:633 __icmpv6_send include/linux/icmpv6.h:28 [inline] icmpv6_send include/linux/icmpv6.h:49 [inline] ip6_pkt_drop+0x1f3/0x860 net/ipv6/route.c:4515 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_xmit+0x1234/0x1cc0 net/ipv6/ip6_output.c:358 sctp_v6_xmit+0xc1b/0x1110 net/sctp/ipv6.c:248 sctp_packet_transmit+0x22e1/0x3020 net/sctp/output.c:653 sctp_packet_singleton+0x19f/0x370 net/sctp/outqueue.c:783 sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline] sctp_outq_flush+0x54d/0x3340 net/sctp/outqueue.c:1212 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1818 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x178f/0x5c50 net/sctp/sm_sideeffect.c:1169 sctp_primitive_ASSOCIATE+0x9c/0xc0 net/sctp/primitive.c:73 __sctp_connect+0x9e9/0xc30 net/sctp/socket.c:1233 sctp_connect net/sctp/socket.c:4811 [inline] sctp_inet_connect+0x15f/0x1f0 net/sctp/socket.c:4826 __sys_connect_file+0x15b/0x1a0 net/socket.c:2046 __sys_connect+0x145/0x170 net/socket.c:2063 __do_sys_connect net/socket.c:2073 [inline] __se_sys_connect net/socket.c:2070 [inline] __x64_sys_connect+0x72/0xb0 net/socket.c:2070 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x62/0x6a to a SOFTIRQ-irq-unsafe lock: (>siglock){+.+.}-{2:2} ... which became SOFTIRQ-irq-unsafe at: ... lock_acquire kernel/locking/lockdep.c:5753 [inline] lock_acquire+0x1b1/0x530 kernel/locking
Re: [PATCH 19/32] kmsan: Accept ranges starting with 0 on s390
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > On s390 the virtual address 0 is valid (current CPU's lowcore is mapped > there), therefore KMSAN should not complain about it. > > Disable the respective check on s390. There doesn't seem to be a > Kconfig option to describe this situation, so explicitly check for > s390. > > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko (see the nit below) > --- > mm/kmsan/init.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c > index ffedf4dbc49d..14f4a432fddd 100644 > --- a/mm/kmsan/init.c > +++ b/mm/kmsan/init.c > @@ -33,7 +33,9 @@ static void __init kmsan_record_future_shadow_range(void > *start, void *end) > bool merged = false; > > KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES); > - KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend); > + KMSAN_WARN_ON((nstart >= nend) || > + (!IS_ENABLED(CONFIG_S390) && !nstart) || Please add a comment explaining this bit. > + !nend); > nstart = ALIGN_DOWN(nstart, PAGE_SIZE); > nend = ALIGN(nend, PAGE_SIZE); > > -- > 2.41.0 >
Re: [PATCH 00/32] kmsan: Enable on s390
Am 16.11.23 um 11:13 schrieb Ilya Leoshkevich: It's also possible to get a free s390 machine at [1]. [1] https://linuxone.cloud.marist.edu/oss I think the URL for registration is this one https://linuxone.cloud.marist.edu/#/register?flag=VM
Re: [PATCH 00/32] kmsan: Enable on s390
On Thu, 2023-11-16 at 09:42 +0100, Alexander Potapenko wrote: > On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich > wrote: > > > > Hi, > > > > This series provides the minimal support for Kernel Memory > > Sanitizer on > > s390. Kernel Memory Sanitizer is clang-only instrumentation for > > finding > > accesses to uninitialized memory. The clang support for s390 has > > already > > been merged [1]. > > > > With this series, I can successfully boot s390 defconfig and > > debug_defconfig with kmsan.panic=1. The tool found one real > > s390-specific bug (fixed in master). > > > > Best regards, > > Ilya > > Hi Ilya, > > This is really impressive! > Can you please share some instructions on how to run KMSAN in QEMU? > I've never touched s390, but I'm assuming it should be possible? I developed this natively (without cross-compilation or emulation, just KVM), but I just gave the following a try on x86_64 and had some success: $ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18 LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy debug_defconfig $ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18 LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy menuconfig $ make LLVM=1 ARCH=s390 O=../linux-build-s390x-cross CC=clang-18 LD=s390x-linux-gnu-ld OBJCOPY=s390x-linux-gnu-objcopy -j24 $ qemu-system-s390x -M accel=tcg -smp 2 -m 4G -kernel ../linux-build- s390x-cross/arch/s390/boot/bzImage -nographic -append 'root=/dev/vda1 rw console=ttyS1 nokaslr earlyprintk cio_ignore=all kmsan.panic=1' - object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng- ccw,rng=rng0 It's also possible to get a free s390 machine at [1]. [1] https://linuxone.cloud.marist.edu/oss
Re: [PATCH 06/32] kmsan: Fix kmsan_copy_to_user() on arches with overlapping address spaces
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Comparing pointers with TASK_SIZE does not make sense when kernel and > userspace overlap. Assume that we are handling user memory access in > this case. > > Reported-by: Alexander Gordeev > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 14/32] kmsan: Use ALIGN_DOWN() in kmsan_get_metadata()
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Improve the readability by replacing the custom aligning logic with > ALIGN_DOWN(). Unlike other places where a similar sequence is used, > there is no size parameter that needs to be adjusted, so the standard > macro fits. Good catch, thank you! > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 21/32] s390: Use a larger stack for KMSAN
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Adjust the stack size for the KMSAN-enabled kernel like it was done > for the KASAN-enabled one in commit 7fef92ccadd7 ("s390/kasan: double > the stack size"). Both tools have similar requirements. > > Reviewed-by: Alexander Gordeev > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 08/32] kmsan: Remove an x86-specific #include from kmsan.h
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Replace the x86-specific asm/pgtable_64_types.h #include with the > linux/pgtable.h one, which all architectures have. > > Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") > Suggested-by: Heiko Carstens > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko (see the comment below) > > -#include > +#include For the sake of consistency with other KMSAN code, please keep the headers sorted alphabetically.
Re: [PATCH 03/32] kmsan: Disable KMSAN when DEFERRED_STRUCT_PAGE_INIT is enabled
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > KMSAN relies on memblock returning all available pages to it > (see kmsan_memblock_free_pages()). It partitions these pages into 3 > categories: pages available to the buddy allocator, shadow pages and > origin pages. This partitioning is static. > > If new pages appear after kmsan_init_runtime(), it is considered > an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as > incompatible with KMSAN. In the future we could probably collect the deferred pages as well, but it's okay to disable KMSAN for now. > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 02/32] kmsan: Make the tests compatible with kmsan.panic=1
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > It's useful to have both tests and kmsan.panic=1 during development, > but right now the warnings, that the tests cause, lead to kernel > panics. > > Temporarily set kmsan.panic=0 for the duration of the KMSAN testing. Nice! > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 20/32] s390: Turn off KMSAN for boot, vdso and purgatory
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > All other sanitizers are disabled for these components as well. > > Reviewed-by: Alexander Gordeev > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko (see a nit below) > --- > arch/s390/boot/Makefile | 1 + > arch/s390/kernel/vdso32/Makefile | 1 + > arch/s390/kernel/vdso64/Makefile | 1 + > arch/s390/purgatory/Makefile | 1 + > 4 files changed, 4 insertions(+) > > diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile > index c7c81e5f9218..5a05c927f703 100644 > --- a/arch/s390/boot/Makefile > +++ b/arch/s390/boot/Makefile > @@ -8,6 +8,7 @@ GCOV_PROFILE := n > UBSAN_SANITIZE := n > KASAN_SANITIZE := n > KCSAN_SANITIZE := n > +KMSAN_SANITIZE := n Nit: I think having even a one-line comment before this block (something similar to https://elixir.bootlin.com/linux/latest/source/arch/x86/boot/Makefile#L12) will make it more clear. But given that the comment wasn't there before, leaving this up to you.
Re: [PATCH 12/32] kmsan: Allow disabling KMSAN checks for the current task
On Thu, 2023-11-16 at 09:56 +0100, Alexander Potapenko wrote: > On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich > wrote: > > > > Like for KASAN, it's useful to temporarily disable KMSAN checks > > around, > > e.g., redzone accesses. > > This example is incorrect, because KMSAN does not have redzones. > You are calling these functions from "mm: slub: Let KMSAN access > metadata", which mentiones redzones in kfree(), but the description > is > still somewhat unclear. > Can you provide more insight about what is going on? Maybe we can fix > those accesses instead of disabling KMSAN? It's about SLUB redzones, which appear when compiling with CONFIG_DEBUG_SLAB. I think that from KMSAN's point of view they should be considered poisoned, but then the question is what to do with functions that check them. I noticed that there was special handling for KASAN there already, so I figured that the best solution would be to do the same thing for KMSAN.
Re: [PATCH 11/32] kmsan: Export panic_on_kmsan
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > When building the kmsan test as a module, modpost fails with the > following error message: > > ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined! > > Export panic_on_kmsan in order to improve the KMSAN usability for > modules. > > Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko
Re: [PATCH 30/32] s390/unwind: Disable KMSAN checks
On Thu, Nov 16, 2023 at 10:04 AM Alexander Potapenko wrote: > > On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich wrote: > > > > The unwind code can read uninitialized frames. Furthermore, even in > > the good case, KMSAN does not emit shadow for backchains. Therefore > > disable it for the unwinding functions. > > > > Signed-off-by: Ilya Leoshkevich > > --- > > arch/s390/kernel/unwind_bc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c > > index 0ece156fdd7c..7ecaab24783f 100644 > > --- a/arch/s390/kernel/unwind_bc.c > > +++ b/arch/s390/kernel/unwind_bc.c > > @@ -49,6 +49,7 @@ static inline bool is_final_pt_regs(struct unwind_state > > *state, > >READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE; > > } > > > > +__no_kmsan_checks > > Please add some comments to the source file to back this annotation, > so that the intent is not lost in git history. Apart from that, Reviewed-by: Alexander Potapenko -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
Re: [PATCH 30/32] s390/unwind: Disable KMSAN checks
On Wed, Nov 15, 2023 at 9:35 PM Ilya Leoshkevich wrote: > > The unwind code can read uninitialized frames. Furthermore, even in > the good case, KMSAN does not emit shadow for backchains. Therefore > disable it for the unwinding functions. > > Signed-off-by: Ilya Leoshkevich > --- > arch/s390/kernel/unwind_bc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/s390/kernel/unwind_bc.c b/arch/s390/kernel/unwind_bc.c > index 0ece156fdd7c..7ecaab24783f 100644 > --- a/arch/s390/kernel/unwind_bc.c > +++ b/arch/s390/kernel/unwind_bc.c > @@ -49,6 +49,7 @@ static inline bool is_final_pt_regs(struct unwind_state > *state, >READ_ONCE_NOCHECK(regs->psw.mask) & PSW_MASK_PSTATE; > } > > +__no_kmsan_checks Please add some comments to the source file to back this annotation, so that the intent is not lost in git history.
Re: [PATCH 12/32] kmsan: Allow disabling KMSAN checks for the current task
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Like for KASAN, it's useful to temporarily disable KMSAN checks around, > e.g., redzone accesses. This example is incorrect, because KMSAN does not have redzones. You are calling these functions from "mm: slub: Let KMSAN access metadata", which mentiones redzones in kfree(), but the description is still somewhat unclear. Can you provide more insight about what is going on? Maybe we can fix those accesses instead of disabling KMSAN?
Re: [PATCH 00/32] kmsan: Enable on s390
On Wed, Nov 15, 2023 at 9:34 PM Ilya Leoshkevich wrote: > > Hi, > > This series provides the minimal support for Kernel Memory Sanitizer on > s390. Kernel Memory Sanitizer is clang-only instrumentation for finding > accesses to uninitialized memory. The clang support for s390 has already > been merged [1]. > > With this series, I can successfully boot s390 defconfig and > debug_defconfig with kmsan.panic=1. The tool found one real > s390-specific bug (fixed in master). > > Best regards, > Ilya Hi Ilya, This is really impressive! Can you please share some instructions on how to run KMSAN in QEMU? I've never touched s390, but I'm assuming it should be possible?