More bugs found in OS code

bearophile Wed, 02 Nov 2011 20:10:52 -0700

The now usual article that advertises the PVS-Studio tool that shows plenty of 
(depressing) bugs found in already debugged source code of widely used C/C++ 
open source projects:


http://software.intel.com/en-us/articles/90-errors-in-open-source-projects/

There is a Reddit discussion too, but I find it useless:
http://www.reddit.com/r/programming/comments/lxjrb/examples_of_errors_detected_in_various_opensource/


In the Reddit discussion someone is free to list in D how many of the 91 bugs:
- Are not applicable or are not normally done in D;
- Are statically caught by D/DMD;
- Are always caught at runtime by DMD in non release mode;
- Are planned/discussed to be statically avoided or statically caught;
- Are planned/discussed to be always caught at runtime by DMD in non release 
mode.



Here I list and comment about some of the more interesting parts. Below I have 
divided the problems in 5 groups (I have not included all the bug groups shown 
in the article).

A]========================================

Example 3. Fennec Media Project project. Complex expression.

uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) 
{
  ...
  while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
    Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {}
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 
'm_nCurrentBitIndex' variable is modified while being used twice at single 
sequence point. MACLib unbitarrayold.cpp 78

------------------------

Example 4. Miranda IM project. Complex expression.

short ezxml_internal_dtd(ezxml_root_t root,
  char *s, size_t len)
{
  ...
  while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') {
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 's' 
variable is modified while being used twice between sequence points.msne zxml.c 
371

------------------------

Currently DMD accepts this code:

x = x++;

Currently D is using the C semantics, this means the result of that line of 
code is undefined, this means you don't know what it will do once compiled by 
different D compilers or even with different optimization level on the same 
compiler. This means writing such code in C (and currently in D) is _always_ a 
bug (because a program with undefined semantics is often useless. There are 
other sources of undefined semantics, but the less there are, the better it 
is). I don't understand why C compilers don't refuse such code statically with 
an error. In my opinion this is not acceptable.

Two basic solutions for D:
1) Define exactly what that code does in D. Walter has expressed few times his 
desire for this solution.
2) Turn similar lines of code into compile-time errors (and maybe accept them 
forever if the -d switch is used, but I am not sure this is a good idea).

The first solution is good because it makes it simpler to port C code to D, it 
allows C/C++/Java programmers to use that code still. But such C code has 
undefined results, so I don't see a great advantage here (it's useful still to 
port Java code).

But lately I am slowly leaning toward the second solution, because even if D 
defines exactly the semantics of code like this, so it gives the same result on 
all D compilers:

(*(n = ++s + strspn(s, EZXML_WS)) && *n != '>')

I don't want to read similar code in D programs I have to debug or modify. Even 
if it's unambiguous for the D language, it's a bit too much hard for me to 
understand. Go language has chosen this second solution.

B]========================================

Curretly D2 accepts the usage of & and | among booleans:


void main() {
    bool a, b;
    auto r1 = a & b;
    static assert(is(typeof(r1) == bool));
    auto r2 = a | b;
    static assert(is(typeof(r2) == bool));
}


But I am not sure this is useful enough to justify the risks that gives.

Using booleans as integers is useful when I have to count them and in few other 
situations. But I don't remember the last time I've had to use bitwise 
or/bitwise and on boolean values, I think I have never had to do this. So maybe 
it's better to forbid this. The advantage of forbidding this operation is that 
it avoids several mistakes caused by operator precedence like (!x & y). 
Comments on this are welcome.

C]========================================

Example 5. IPP Samples project. Priorities of ?: and | operations.

vm_file* vm_file_fopen(...)
{
  ...
  mds[3] = FILE_ATTRIBUTE_NORMAL |
           (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING;
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator 
works in a different way than it was expected. The '?:' operator has a lower 
priority than the '|' operator. vm vm_file_win.c 393

------------------------

Example 6. Newton Game Dynamics project. Priorities of ?: and * operations.

dgInt32 CalculateConvexShapeIntersection (...)
{
  ...
  den = dgFloat32 (1.0e-24f) *
        (den > dgFloat32 (0.0f)) ?
          dgFloat32 (1.0f) : dgFloat32 (-1.0f);
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator 
works in a different way than it was expected. The '?:' operator has a lower 
priority than the '*' operator. physics dgminkowskiconv.cpp 1061

D]========================================

Currently this is not caught by D, it prints "12":

import std.stdio;
void main() {
    writefln("%d%d", 1, 2, 3);
}


I'd like this code to give a compile-time error.

If this is not possible, then I'd like this code to give a run-time error.

The third possibility is to print "123", but this is bad enough.

Printing just "12" and ignoring 3, as done by DMD 2.056, is not acceptable, in 
my opinion.

E]========================================

Example 1. Shareaza project. Value range of char type.

void CRemote::Output(LPCTSTR pszName)
{

  ...
  CHAR* pBytes = new CHAR[ nBytes ];
  hFile.Read( pBytes, nBytes );
  ...
  if ( nBytes > 3 && pBytes[0] == 0xEF &&
       pBytes[1] == 0xBB && pBytes[2] == 0xBF )
  {
    pBytes += 3;
    nBytes -= 3;
    bBOM = true;
  }
  ...
}

The error was found through the V547 diagnostic: Expression 'pBytes [ 0 ] == 
0xEF' is always false. The value range of signed char type: [-128, 127]. 
Shareaza remote.cpp 350

------------------------

Example 3. VirtualDub project. Unsigned type is always >= 0.

typedef unsigned short wint_t;
...
void lexungetc(wint_t c) {
  if (c < 0)
    return;
   g_backstack.push_back(c);
}

The error was found through the V547 diagnostic: Expression 'c < 0' is always 
false. Unsigned type value is never < 0. Ami lexer.cpp 225

The "c < 0" condition is always false because the variable of the unsigned type 
is always above or equal to 0.

------------------------

Example 7. QT project. Dangerous loop.

bool equals( class1* val1, class2* val2 ) const{
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

The error was found through the V547 diagnostic: Expression '--size >= 0' is 
always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154

The (--size >= 0) condition is always true, since the size variable has the 
unsigned type. It means that if two sequences being compared are alike, we will 
get an overflow that will in its turn cause Access Violation or other program 
failures.

------------------------

Example 8. MySQL project. Error in condition.

enum enum_mysql_timestamp_type
str_to_datetime(...)
{
  ...
  else if (str[0] != 'a' || str[0] != 'A')
    continue; /* Not AM/PM */
  ...
}

The error was found through the V547 diagnostic: Expression 'str [0] != 'a' || 
str [0] != 'A'' is always true. Probably the '&&' operator should be used here. 
clientlib my_time.c 340

The condition is always true because the character is always either not equal 
to 'a' or to 'A'. This is the correct check:

else if (str[0] != 'a' && str[0] != 'A')


Comments are welcome.

Bye,
bearophile

More bugs found in OS code

Reply via email to