https://bugs.kde.org/show_bug.cgi?id=514297
--- Comment #30 from [email protected] --- (In reply to Mark Wielaard from comment #24) > Hi Martin, > > (In reply to mcermak from comment #23) > > Created attachment 190530 [details] > > proposed patch > > > > Updated patch trying to address all the review comments above. I've also > > checked this update to a topic branch users/mcermak/try-bug514297 and got > > buildbot test results for it: > > > > https://builder.sourceware.org/buildbot/#/builders/244/builds/139 > > Nice, Thanks. Note the following warnings on powerpc (but not x86_64): > > m_syswrap/syswrap-generic.c:2424:22: warning: self-comparison always > evaluates to false [-Wtautological-compare] > m_syswrap/syswrap-generic.c:2429:25: warning: self-comparison always > evaluates to false [-Wtautological-compare] > m_aspacemgr/aspacemgr-linux.c:1128:13: warning: function declaration isn’t a > prototype [-Wstrict-prototypes] > m_aspacemgr/aspacemgr-linux.c:3054:20: warning: self-comparison always > evaluates to false [-Wtautological-compare] > m_aspacemgr/aspacemgr-linux.c:3063:20: warning: self-comparison always > evaluates to false [-Wtautological-compare] > > I haven't investigated why. And these might not even be in your new code. If > it doesn't make sense, just ignore. I believe those are not related to my code. Looking closer at those warnings, Ive found one that was related to my code: m_aspacemgr/aspacemgr-linux.c:1127:13: warning: function declaration isn’t a prototype [-Wstrict-prototypes] I've fixed that one here (I believe!) https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=f3bd69814d250edc83660039e70ece15e2c3eb38 > > make regtest seems to have a lot more failures than I am getting locally. > But this is not specific to your try run. > It looks like in general the buildbot has a lot more failures (a lot of > extra inherited leaked file descriptors?) . Very odd. I've went through all the regressions and fixed them! > > What are the make regtest results locally? I am getting (with current git > trunk on x86_64 fedora 43): > > == 963 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 1 > stdoutB failure, 0 post failures == > gdbserver_tests/hgtls (stdoutB) Currently on my Fedora-43 My update tests cleanly: == 869 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures == On Fedora Rawhide I'm getting one failure, but it's not a regression caused by my code, I believe. So, seems like tests pass cleanly with my latest update. ======================= (In reply to Mark Wielaard from comment #25) > OK, nice, so when is_guarded is called, we do a sanity check that our > internal data (still) matches the /proc/self/pagemap guard bit. > I think that can/should be extended to also trigger for when > AM_SANITY_CHECK/am_do_sync_check is called. > Probably added to parse_procselfmaps and/or the sync_check_mapping_callback. > So each segment is checked > for matching /proc/self/maps and then for each segment to check whether or > not it contains guard pages according > to /proc/self/pagemap. But this might be a lot of extra work. Would still be > useful to catch issues though. Oh, I've forgot to address this one. Let me focus on this now! ======================= (In reply to Mark Wielaard from comment #26) > (In reply to mcermak from comment #23) > > > Note that significant effort was spent to make is_valid_for fast. > > > ... > > > So, ideally, checking for guard page should be fast either > > > > I've reimplemented the bookkeeping mechanism here to make it perform better: > > https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_aspacemgr/ > > aspacemgr-linux.c;h=7b449fd7a2a9204ffd6da7b9ca81e1847e08ef24; > > hb=8727ba9b9edc9ebb053bb63e8eb8c8c49287f77a#l1078 > > > > The new implementation doesn't check /proc/self/pagemap at all by default. > > Instead it has a new array guardpages[VG_N_THREADS] keeping an ordered list > > of start addresses of madvise guard pages. The array is defined together > > with a description explaining how its max size was set. Function > > is_guarded() tries to quickly answer the question whether an address belongs > > to a guard page. > > Expecting VG_N_THREADS of (max) guard pages seems like a good intuition. > But maybe we want to introduce a separate constant/command line flag. > I would suggest using VG(clo_max_guard_pages) and VG_N_GUARDS, which would > default to VG_N_THREADS/VG(clo_max_threads) if not given. Done! > > Do note that VG_N_THREADS (and VG_N_GUARDS) are not constants (you do define > VG_N_THREADS now as constant). > So the guardpages array should be allocated dynamically, either on > am_startup or on first use (like the cache in find_nsegment_idx). Done, addressed here: https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=6c0a92c9b2c46b80a5ad2bf4b1e20cfe1fb5dbfb As the commit description mentions, it tests fine, but I suspect it needs to free() that memory somewhere. That's not in place right not. I'm unsure how to do it exactly. Maybe handling the free() in shutdown_actions_NORETURN() might be an option (?) > > is_guarded does a binary search and guard_page_install/guard_page_remove > keep the array sorted (on page boundary). > Nice. I wonder if you could also use a binary search in the install/remove > case and move all addresses down/up. > Might be slightly more efficient. But installs/removes are probably very > infrequent. Yup! Addressed here: https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=fcfba0573b5eed7281a5ee2cf4c830402dba8464 > > guard_page_remove allows for the address not to be found. But is that really > possible? > Does madvise (addr, size, MADV_GUARD_REMOVE) succeed or fail if the guard > pages weren't installed first? > If it fails then then guard_page_remove should probably assert when called > on a (page boundary) address that cannot be found. > But if that is legal and the syscall succeeds (without actually removing > anything) then the code as written looks correct. Turns out that except for MADV_GUARD_REMOVE, a guard page may also be removed by munmap(). That happens in practice (testsuite covers it) and I've handled that here: https://sourceware.org/git?p=valgrind.git;a=commitdiff;h=769a1f667fb59eda87fe8a35cf23faa5f8fb155b and https://sourceware.org/git?p=valgrind.git;a=commitdiff;h=bf620949b979041c4651fed0da503c2f2a1e3674 My current code shows a failure in case a removal of non-existent guard page would be requested. ======================= (In reply to Mark Wielaard from comment #27) > Nice. Could you move the move the pub_core_libcfile.h include into > priv_aspacemgr.h > (There is a big comment block just before that saying new includes should > move there.) Done! ======================= (In reply to Mark Wielaard from comment #28) > (In reply to mcermak from comment #23) > > > I think the safe_to_deref before the PRE_MEM_RASCIIZ might be a > > > separate/generic issue > > > > Right. I've kept it a part of the patch for now to simplify testing. > > Moreover my patch has one more unrelated bit (initialization of vki_sigset_t > > saved in m_syswrap/syswrap-main.c) silencing my compiler complaints: > > https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_syswrap/ > > syswrap-main.c;h=7a1bbb40fa22ac33f4d12ac577ab75fd7979f4dc;hb=refs/heads/ > > users/mcermak/try-bug514297#l354 > > What compiler complaint exactly? > If at all possible we should just commit such fixups separately. This happens only with -O0 (and -Werror -Wfatal-errors ) on Fedora Rawhide gcc-16.0.1-0.9.fc45.x86_64 . It should be a separate commit, agreed. For now it's still part of my update. > > > > Could you show how these programs use the guard pages ? > > > > Here is how a guard page is created for each new thread: > > > > f44 x86_64 # valgrind -d python3 -c 'import time, threading; > > [threading.Thread(None, lambda: time.sleep(1)).start() for x in range(4)]' > > 2> >(grep guard) > > --98531:1: aspacem installing guard pages (addr=0x5a8a000, len=0x1000) > > --98531:1: aspacem installing guard pages (addr=0x628f000, len=0x1000) > > --98531:1: aspacem installing guard pages (addr=0x6a94000, len=0x1000) > > --98531:1: aspacem installing guard pages (addr=0x7299000, len=0x1000) > > f44 x86_64 # > > > > > Could you investigate how things work with > > > memcheck/tests/descr_belowsp.vgtest ? [ ... stuff deleted ... ] ======================= (In reply to Mark Wielaard from comment #29) > (In reply to mcermak from comment #23) > > > Could you investigate how things work with > > > memcheck/tests/descr_belowsp.vgtest ? > > > > Seems like the testcase is missing "In stack guard protected page" in the > > output. This is supposed to come from coregrind/m_addrinfo.c:242: > > > > 236 if (seg != NULL && seg->kind == SkAnonC > > 237 && !seg->hasR && !seg->hasW && !seg->hasX) { > > 238 /* This looks a plausible guard page. Check if a is close to > > 239 the start of stack (lowest byte). */ > > 240 tid = find_tid_with_stack_containing (VG_PGROUNDUP(a+1)); > > 241 if (tid != VG_INVALID_THREADID) > > 242 stackPos = StackPos_guard_page; > > > > ... this apparently is a place where the new madvise guard page support > > should be taken into account as well. > > Good find. > So here we would need something like: > > if (seg != NULL && seg->kind == SkAnonC > && ((!seg->hasR && !seg->hasW && !seg->hasX) > #if defined(VGO_linux) > || (seg->hasGuardPages && am_is_guarded (a)) > #endif > ) { > /* This looks a plausible guard page. Check if a is close to > the start of stack (lowest byte). */ > > ? where am_is_guarded would be some new exported helper from the aspace > manager? Right! So, after some debugging, I've addressed it here: https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=e38f96c8eb76e1310434b482c67a69095e4b476d ======================= So, summary, I've addressed all the comments, except for the one from comment #25 - I'm going to address that now. My current patch tests with 0 failures. I'd like to kindly ask for another review. My code is available in the users/mcermak/try-bug514297 branch. -- You are receiving this mail because: You are watching all bug changes.
