On Tuesday, 9 May 2017 at 16:55:54 UTC, H. S. Teoh wrote:
On Tue, May 09, 2017 at 08:18:09AM +0000, Patrick Schluter via
[...]
Ouch. Haha, even I forgot about this particularly lovely aspect of C. Hooray, freely call functions without declaring them, and "obviously" they return int! Why not?

There's an even more pernicious version of this, in that the compiler blindly believes whatever you declare a symbol to be, and the declaration doesn't even have to be in a .h file or anything even remotely related to the real definition. Here's a (greatly) reduced example (paraphrased from an actual bug I discovered):

        module.c:
        -------
        int get_passwd(char *buf, int size);

yeah, this is a code smell. A not static declared function prototype in a C file. Raises the alarm bells automatically now. The same issue but much more frequent to observe, extern variable declaration in .c files. That one is really widespread and few see it as an anti-pattern. An extern global variable should always be put in the header file, never in the C file. Exactly for the same reason as your example with the wrong prototype below: non matching types the linker will join wrongly.

        int func() {
                char passwd[100];
                if (!get_passwd(buf, 100)) return -1;
                do_something(passwd);
        }

        passwd.c:
        ---------
        void get_passwd(struct user_db *db, struct login_record *rec) {
                ... // stuff
        }

        old_passwd.c:
        -------------
        /* Please don't use this code anymore, it's deprecated. */
        /* ^^^^ gratuitous useless comment */
        int get_passwd(char *buf, int size) { ... /* old code */ }

Originally, in the makefile, module.o is linked with libutil.so, which in turn is built from old_passwd.o and a bunch of other stuff. Later on, passwd.o was added to libotherutil.so, which was listed after libutil.so in the linker command, so the symbol conflict was masked because the linker found the libutil.so version of get_passwd first.

Then one day, somebody changed the order of libraries in the makefile, and suddenly func() mysteriously starts malfunctioning because get_passwd now links to the wrong version of the function!

Worse yet, the makefile was written to be "smart", as in, it uses wildcards to pick up .so files (y'know, us lazy programmers don't wanna have to manually type out the name of every library).

Yeah, we also had makefiles using wildcards. It took a long time but I managed to get rid of them.

So when somebody tried to fix this bug by removing old_passwd.o from libotherutil.so altogether, the bug was still happening in other developers' machines, because a stale copy of the old version of libotherutil.so was still left in their source tree, so when *they* built the executable, it contains the bug, but the bug vanishes when built from a fresh checkout. Who knows how many hours were wasted chasing after this heisenbug.


[...]
>            if (fp) fclose(fp);     /* oops, uninitialized ptr deref */

Worse, you didn't check the return of fclose() on writing FILE.
fclose() can fail if the disk was full. As the FILE is buffered, the last fwrite might not have flushed it yet. So it is the fclose() that
will try to write the last block and that can fail, but the app
wouldn't be able to even report it.
[...]

Haha, you're right. NONE of the code I've ever dealt with even considers this case. None at all. In fact, I don't even remember the last time I've seen C code that bothers checking the return value of fclose(). Maybe I've written it *once* in my lifetime when I was young and naïve, and actually bothered to notice the documentation that fclose() may sometimes fail. Even the static analysis tool we're using doesn't report it!!

I discovered that one only a few month ago. I have now around 30 places in our code base to fix. It's only important for writing FILE. Reading FILE can ignore the return values.


So again Walter was spot on: fill up the disk to 99% full, and 99% of C programs would start malfunctioning and showing all kinds of odd behaviours, because they never check the return code of printf, fprintf, or fclose, or any of a whole bunch of other syscalls that are regularly *assumed* to just work, when in reality they *can* fail.

The worst part of all this is, this kind of C code is prevalent everywhere in C projects, including those intended for supposedly security-aware software. Basically, the language itself is just so unfriendly to safe coding practices that it's nigh impossible to write safe code in it. It's *theoretically* possible, certainly, but in practice nobody writes C code that way. It is a scary thought indeeed, how much of our current infrastructure relies on software running this kind of code. Something's gotta give, eventually. And it ain't gonna be pretty when it all starts crumbling down.

Agreed. That's why I'm learning D now, it's probably the only language that will be able to replace progressively our C code base in a skunk work fashion. I wanted to do it already 5 or 6 years ago but couldn't as we were on Solaris/SPARC back then. Now that we have migrated to Linux-AMD64 there's not much holding us back. Oracle client is maybe still an issue though.

Reply via email to