Memory safety, C#, D and more

bearophile Tue, 05 May 2009 18:25:24 -0700

Here I have collected few more bits that may be interesting for D 
development/design.


-------------------

In C# the "fixed" statement prevents the garbage collector from relocating a 
movable variable. The fixed statement is only permitted in an unsafe context:

http://msdn.microsoft.com/en-us/library/f58wzh21.aspx
http://msdn.microsoft.com/en-us/library/aa664784(VS.71).aspx

So it "pins" a variable, so the GC can't move it anymore in memory, so you can 
then take and use its address safely. It looks a bit messy, but it allows C# to 
avoid a conservative GC and keep its moving one.

You can use it for example like this:

int[,,] a = new int[2, 3, 4];
unsafe {
   fixed (int* p = a) {
      for (int i = 0; i < a.Length; ++i) // treat as linear
         p[i] = i;
   }
}

Where int[,,] are built-in multi-dimensional arrays made of a single block of 
memory.
C# has built-in both arrays of arrays as D, and such multi-dimensional arrays 
that save some memory and improve cache coherence a bit (but sometimes on 
modern CPU I have seen they may end a bit slower, because they may require 
integer multiplications to find items if a bitshift can't be used).

fixed can also be nested if you want to pin two or more pointers:

fixed (...) fixed (...) { ... }

The pointer is meant as fixed only inside the scope.

Where you use "fixed" to take the char* of a string, then the compiler calls 
toStringz automatically.

You can also use fixed to call another function with a pointer:

class Test {
   unsafe static void Fill(int* p, int count, int value) {
      for (; count != 0; count--)
         *p++ = value;
   }
   static void Main() {
      int[] a = new int[100];
      unsafe {
         fixed (int* p = a) Fill(p, 100, -1);
      }
   }
}

I guess the compiler makes sure to never relocate the "a" array inside that 
Fill() method.

So C# follows the principle opposite of D: start from being safe and allow 
everything possible to increase flexibility. D starts from an unsafe situation 
and does more to give some safety.

This explains a bit how "fixed" interacts with the generational GC:
http://www.codeproject.com/KB/dotnet/pointers.aspx

>Pinning has a HUGE cost to the garbage collector. I assume that you are 
>familiar with the generational algorithm of the garbage collection. Let us say 
>we allocated enough memory to fill Gen 0 Heap (the youngest), and that an 
>additional allocation will trigger a collection. If that very last allocation 
>at the end of the heap was pinned, the pinned object moves to generation 1. 
>(Call GC.GetGeneration(obj) and see). Gen 1 is guaranteed to grow to include 
>the pinned memory at the very end of the Gen 0 Heap. Even if all other memory 
>in Gen 0 was freed, that would still leave a huge unreclaimed space of memory 
>and Gen 0 will begin allocating starting from its previous limit. That is how 
>bad "pinning" is. [...] when you use fixed, do whatever you have do quickly 
>and avoid any memory allocation in the process, which can potentially trigger 
>a garbage collection. If a garbage collection did occur inside a fixed block, 
>most likely the pinned memory was close to the end of Gen 0 heap.<


In practice the C# runtime retains most of its safety even if you use pointers. 
For example if you run the following code (not in debug mode):

int* a = stackalloc int[n];
for (int i = 0; i < 3 * n; i++) {
    a[i] = i;
    Console.WriteLine("a[i] = {0}", a[i]);
}

With n=10 it stops running just after i=10 (1 past the length). So the runtime 
is able to catch the trespassing outside the allowed memory anyway, and the 
docs say it stops the program as soon as possible to avoid malicious code, 
avoid troubles, etc.

"stackalloc" is the way to have in C# the stack-based dynamic arrays of C99 (I 
may like to have them in D2 too. C# is surely a kitchen-sink-too language). So 
that's a stack safety, not an heap one.

Such kind of unsafe code that uses pointers is faster than the normal C# code 
(often the compiler/runtime isn't able to remove array bound checks, despite 
this is a supported feature) and slower than equivalent "release mode" D code. 
I don't know how the C# runtime is able to catch that trespassing, maybe it 
uses a canary, or sets the memory after the array as not writeable.

After a small test with the following code that performs reads only:

int* a = stackalloc int[n];
for (int i = 0; i < 30 * n; i++) {
    Console.WriteLine("a[{0}] = {1}", i, a[i]);
}

Now the running doesn't stop, so with n=10 it stops printing when i = 299. So 
there's write-safety only.

I have tried with dmd a stack-based "array":

import std.conv: toInt;
import std.c.stdlib: alloca;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    int* a = cast(int*)alloca(n * int.sizeof);
    for (int i = 0; i < 30 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

It stops printing after i = 12 (3 items after the last one). If inside the loop 
I keep only the printf, it prints up to 300 and more, no read safety.


While the following code with a heap-based array:

import std.conv: toInt;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    auto aa = new int[n];
    auto a = aa.ptr;
    for (int i = 0; i < 3000 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

generates an Access Violation after i=15391, there's not much write safety.

In C# the following heap-based array program:

using System;
unsafe sealed class test {
    static unsafe void Main(string[] args) {
        int n = args.Length > 0 ? Int32.Parse(args[0]) : 10;
        int[] a = new int[n];
        unsafe {
            fixed (int* p = a) {
                for (int i = 0; i < 1000 * n; ++i) {
                    p[i] = i;
                    Console.WriteLine("p[{0}] = {1}", i, p[i]);
                }
            }
        }
    }
}

prints items up to i=20 and then throws an exception:
System.IO.IOException, "The handle is invalid"

(in debug code it stops when i is about 25). So even with heap memory and in 
unsafe mode C# is safe enough (and stopping very soon it allows to find bugs 
faster, because the program stops very close to where the bug is).

Having such safety when working with pointers-based arrays is a very good 
thing, I'd like to have it D too when I am not compiling in release mode. Is 
this doable?

-----------------------------

C# enums can optionally have the "Flags" attribute, that doesn't change the 
0,1,2,3... of items, but the compiler sees them as powers of two, so they can 
be combined bitwise:
http://weblogs.asp.net/wim/archive/2004/04/07/109095.aspx

[Flags]
public enum ClientStates {
  Ordinary,
  HasDiscount,
  IsSupplier,
  IsBlackListed,
  IsOverdrawn
}

ClientStates c = ClientStates.HasDiscount | ClientStates.IsSupplier;

C# enum values can also be printed (and they show their name), this is useful 
for D2 too.

-----------------------------

Unrelated. (Java) 'new' considered harmful:
http://www.ddj.com/java/184405016

Bye,
bearophile

Memory safety, C#, D and more

Reply via email to