https://bugs.kde.org/show_bug.cgi?id=514297

--- Comment #30 from [email protected] ---
(In reply to Mark Wielaard from comment #24)
> Hi Martin,
> 
> (In reply to mcermak from comment #23)
> > Created attachment 190530 [details]
> > proposed patch
> > 
> > Updated patch trying to address all the review comments above.  I've also
> > checked this update to a topic branch users/mcermak/try-bug514297 and got
> > buildbot test results for it:
> > 
> > https://builder.sourceware.org/buildbot/#/builders/244/builds/139
>  
> Nice, Thanks. Note the following warnings on powerpc (but not x86_64):
> 
> m_syswrap/syswrap-generic.c:2424:22: warning: self-comparison always
> evaluates to false [-Wtautological-compare]
> m_syswrap/syswrap-generic.c:2429:25: warning: self-comparison always
> evaluates to false [-Wtautological-compare]
> m_aspacemgr/aspacemgr-linux.c:1128:13: warning: function declaration isn’t a
> prototype [-Wstrict-prototypes]
> m_aspacemgr/aspacemgr-linux.c:3054:20: warning: self-comparison always
> evaluates to false [-Wtautological-compare]
> m_aspacemgr/aspacemgr-linux.c:3063:20: warning: self-comparison always
> evaluates to false [-Wtautological-compare]
> 
> I haven't investigated why. And these might not even be in your new code. If
> it doesn't make sense, just ignore.

I believe those are not related to my code.  Looking closer at those warnings,
Ive found one that was related to my code:

m_aspacemgr/aspacemgr-linux.c:1127:13: warning: function declaration isn’t a
prototype [-Wstrict-prototypes]

I've fixed that one here (I believe!)
https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=f3bd69814d250edc83660039e70ece15e2c3eb38

> 
> make regtest seems to have a lot more failures than I am getting locally.
> But this is not specific to your try run.
> It looks like in general the buildbot has a lot more failures (a lot of
> extra inherited leaked file descriptors?) . Very odd.

I've went through all the regressions and fixed them!  
> 
> What are the make regtest results locally? I am getting (with current git
> trunk on x86_64 fedora 43):
> 
> == 963 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 1
> stdoutB failure, 0 post failures ==
> gdbserver_tests/hgtls                    (stdoutB)

Currently on my Fedora-43
My update tests cleanly:
== 869 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0
stdoutB failures, 0 post failures ==

On Fedora Rawhide I'm getting one failure, but it's not a regression caused by
my code, I believe.

So, seems like tests pass cleanly with my latest update.

=======================

(In reply to Mark Wielaard from comment #25)

> OK, nice, so when is_guarded is called, we do a sanity check that our
> internal data (still) matches the /proc/self/pagemap guard bit.
> I think that can/should be extended to also trigger for when
> AM_SANITY_CHECK/am_do_sync_check is called.
> Probably added to parse_procselfmaps and/or the sync_check_mapping_callback.
> So each segment is checked
> for matching /proc/self/maps and then for each segment to check whether or
> not it contains guard pages according
> to /proc/self/pagemap. But this might be a lot of extra work. Would still be
> useful to catch issues though.

Oh, I've forgot to address this one.  Let me focus on this now!

=======================

(In reply to Mark Wielaard from comment #26)
> (In reply to mcermak from comment #23)
> > > Note that significant effort was spent to make is_valid_for fast.
> > > ...
> > > So, ideally, checking for guard page should be fast either
> > 
> > I've reimplemented the bookkeeping mechanism here to make it perform better:
> > https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_aspacemgr/
> > aspacemgr-linux.c;h=7b449fd7a2a9204ffd6da7b9ca81e1847e08ef24;
> > hb=8727ba9b9edc9ebb053bb63e8eb8c8c49287f77a#l1078
> > 
> > The new implementation doesn't check /proc/self/pagemap at all by default. 
> > Instead it has a new array guardpages[VG_N_THREADS] keeping an ordered list
> > of start addresses of madvise guard pages.  The array is defined together
> > with a description explaining how its max size was set.  Function
> > is_guarded() tries to quickly answer the question whether an address belongs
> > to a guard page.
> 
> Expecting VG_N_THREADS of (max) guard pages seems like a good intuition.
> But maybe we want to introduce a separate constant/command line flag.
> I would suggest using VG(clo_max_guard_pages) and VG_N_GUARDS, which would
> default to VG_N_THREADS/VG(clo_max_threads) if not given.

Done!

> 
> Do note that VG_N_THREADS (and VG_N_GUARDS) are not constants (you do define
> VG_N_THREADS now as constant).
> So the guardpages array should be allocated dynamically, either on
> am_startup or on first use (like the cache in find_nsegment_idx).

Done, addressed here:
https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=6c0a92c9b2c46b80a5ad2bf4b1e20cfe1fb5dbfb

As the commit description mentions, it tests fine, but I suspect
it needs to free() that memory somewhere.  That's not in place right
not.  I'm unsure how to do it exactly.  Maybe handling the free()
in shutdown_actions_NORETURN() might be an option (?)

> 
> is_guarded does a binary search and guard_page_install/guard_page_remove
> keep the array sorted (on page boundary).
> Nice. I wonder if you could also use a binary search in the install/remove
> case and move all addresses down/up.
> Might be slightly more efficient. But installs/removes are probably very
> infrequent.

Yup! Addressed here:
https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=fcfba0573b5eed7281a5ee2cf4c830402dba8464
> 
> guard_page_remove allows for the address not to be found. But is that really
> possible?

> Does madvise (addr, size, MADV_GUARD_REMOVE) succeed or fail if the guard
> pages weren't installed first?
> If it fails then then guard_page_remove should probably assert when called
> on a (page boundary) address that cannot be found.
> But if that is legal and the syscall succeeds (without actually removing
> anything) then the code as written looks correct.

Turns out that except for MADV_GUARD_REMOVE, a guard page may also be removed
by munmap().  That happens in practice (testsuite covers it) and I've handled
that here:
https://sourceware.org/git?p=valgrind.git;a=commitdiff;h=769a1f667fb59eda87fe8a35cf23faa5f8fb155b
and 
https://sourceware.org/git?p=valgrind.git;a=commitdiff;h=bf620949b979041c4651fed0da503c2f2a1e3674
My current code shows a failure in case a removal of non-existent guard page
would be requested.

=======================

(In reply to Mark Wielaard from comment #27)

> Nice. Could you move the move the pub_core_libcfile.h include into
> priv_aspacemgr.h
> (There is a big comment block just before that saying new includes should
> move there.)

Done!

=======================

(In reply to Mark Wielaard from comment #28)
> (In reply to mcermak from comment #23)
> > > I think the safe_to_deref before the PRE_MEM_RASCIIZ might be a 
> > > separate/generic issue
> > 
> > Right.  I've kept it a part of the patch for now to simplify testing. 
> > Moreover my patch has one more unrelated bit (initialization of vki_sigset_t
> > saved in m_syswrap/syswrap-main.c) silencing my compiler complaints:
> > https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_syswrap/
> > syswrap-main.c;h=7a1bbb40fa22ac33f4d12ac577ab75fd7979f4dc;hb=refs/heads/
> > users/mcermak/try-bug514297#l354
> 
> What compiler complaint exactly?
> If at all possible we should just commit such fixups separately.


This happens only with -O0 (and -Werror -Wfatal-errors ) on Fedora
Rawhide gcc-16.0.1-0.9.fc45.x86_64 .  It should be a separate commit,
agreed.  For now it's still part of my update.


> 
> > > Could you show how these programs use the guard pages ?
> > 
> > Here is how a guard page is created for each new thread:
> > 
> > f44 x86_64 # valgrind -d python3 -c 'import time, threading;
> > [threading.Thread(None, lambda: time.sleep(1)).start() for x in range(4)]'
> > 2> >(grep guard)
> > --98531:1: aspacem installing guard pages (addr=0x5a8a000, len=0x1000)
> > --98531:1: aspacem installing guard pages (addr=0x628f000, len=0x1000)
> > --98531:1: aspacem installing guard pages (addr=0x6a94000, len=0x1000)
> > --98531:1: aspacem installing guard pages (addr=0x7299000, len=0x1000)
> > f44 x86_64 # 
> > 
> > > Could you investigate how things work with 
> > > memcheck/tests/descr_belowsp.vgtest ?

[ ... stuff deleted ... ]

=======================

(In reply to Mark Wielaard from comment #29)
> (In reply to mcermak from comment #23)
> > > Could you investigate how things work with 
> > > memcheck/tests/descr_belowsp.vgtest ?
> > 
> > Seems like the testcase is missing "In stack guard protected page" in the
> > output.  This is supposed to come from coregrind/m_addrinfo.c:242:
> > 
> > 236          if (seg != NULL && seg->kind == SkAnonC
> > 237              && !seg->hasR && !seg->hasW && !seg->hasX) {
> > 238             /* This looks a plausible guard page. Check if a is close to
> > 239                the start of stack (lowest byte). */
> > 240             tid = find_tid_with_stack_containing (VG_PGROUNDUP(a+1));
> > 241             if (tid != VG_INVALID_THREADID)
> > 242                stackPos = StackPos_guard_page;
> > 
> > ... this apparently is a place where the new madvise guard page support
> > should be taken into account as well.
> 
> Good find.
> So here we would need something like:
> 
>           if (seg != NULL && seg->kind == SkAnonC
>               && ((!seg->hasR && !seg->hasW && !seg->hasX)
> #if defined(VGO_linux)
>                       || (seg->hasGuardPages && am_is_guarded (a))
> #endif
>                      ) {
>              /* This looks a plausible guard page. Check if a is close to
>                 the start of stack (lowest byte). */
> 
> ? where am_is_guarded would be some new exported helper from the aspace
> manager?

Right!  So, after some debugging, I've addressed it here:
https://sourceware.org/git/?p=valgrind.git;a=commitdiff;h=e38f96c8eb76e1310434b482c67a69095e4b476d

=======================

So, summary, I've addressed all the comments, except for the one from comment
#25 - I'm going to address that now.  My current patch tests with 0 failures. 
I'd like to kindly ask for another review.  My code is available in the
users/mcermak/try-bug514297 branch.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to