Hi Paul,
We have a patch that you could try, it would be great if you could tell
whether this fixes the problem for you:
http://midnight-commander.org/attachment/ticket/3846/3846_mcview_hook.patch
On Sat, 5 Aug 2017, Paul Wise wrote:
On Fri, 2017-08-04 at 23:20 +0200, Yury V. Zaytsev wrote:
The most obvious reason for the segfault would have been that somebody
has freed `view->filename_vpath` without setting it to NULL, but a
quick grep for `->filename_vpath` doesn't reveal anything suspicious
:-(
Discussion on IRC made us think that it was using memory that has been
allocated but not initialised, because the MALLOC_PERTURB_=254
MALLOC_CHECK_=2 combination causes glibc to poison new memory
allocations with 0xFE and if you then see 0xFE anywhere, that means that
the program is buggy and interacting with uninitialised memory.
The initial report lacked a critical piece of information, and
specifically that you have pressed "Ctrl+x q" during panel initialization
and this caused the crash, and not merely observed a "random crash on
startup".
This is the reason why I originally assumed that either the pointer wasn't
set to NULL after getting freed, or something else overwrote it to point
to unitialized memory.
The issue is easily reproducible for me and I think you could reproduce
it by introducing a 5 second sleep call just before the panel setup.
So far, I wasn't able reproduce it, at least not that easily.
PS: you might be interested in the valgrind output for one of the
sessions where I was able to press Ctrl+x q early enough. The log
clearly shows that valgrind found a number of invalid reads.
https://people.debian.org/~pabs/tmp/valgrind-mc.gz
Yes, this log has been really useful and now it is more or less clear
what's going on. Specifically, a dialog started processing an event you
triggered by pressing "Ctrl+x q" before it was fully initialized, and this
lead to the crash you observed. Instead, events should be ignored during
the initialization phase, or queued for later processing.
A similar issue with another dialog (much easier to reproduce) has been
fixed in 4.8.19 as I said earlier. Unfortunately, such issues haven't been
consistently considered by past developers, so many more of the same type
of problems might be lurking around the widget system.
PS: I would suggest running the mc code through some static analysis
tools, either via check-all-the-things or by manually running the tools
listed in these two configuration files:
I think that you are misinterpreting your valgrind report. All of the very
many invalid reads you observed are caused by the same underlying cause,
and specifically you triggering an event processing before the dialog has
been fully initialized. If you run mc under valgrind operating normally
which we routinely do, the report should be clean.
And yes, we do use static analysis tools like cppcheck, experimental
compiler warnings (which have been actually much more useful than the
former), Google Sanitizers and the like.
If you are fine with using proprietary services to improve free code,
you could look at Coverity. You could also get Google to do some fuzzing
of the mc codebase.
Indeed, we have been using Coverity for the last 5 years or so. In as far
as fuzzing is concerned, unfortunately, mc hasn't been programmed to be
robust against invalid inputs to begin with and is written in a very
unforgiving language, so a lot of work is still to be done for fuzzing to
be actually useful instead of simply revealing the truth above.
--
Sincerely yours,
Yury V. Zaytsev