On Sunday, 4 December 2022 at 19:00:15 UTC, H. S. Teoh wrote:
This is true only if you're talking about pointers in the sense of pointers in assembly language. Languages like C and D add another layer of abstraction over this.


Another thing with pointers is that it doesn't have "types".

This is where you went wrong. In assembly language, yes, a pointer value is just a number, and there's no type associated with it. However, experience has shown that manipulating pointers at this raw, untyped level is extremely error-prone. Therefore, in languages like C or D, a pointer *does* have a type. It's a way of preventing the programmer from making silly mistakes, by associating a type (at compile-time only, of course) to the pointer value. It's a way of keeping track that address 1234 points to a short, and not to a float, for example. At the assembly level, of course, this type information is erased, and the pointers are just integer addresses. However, at compile-type, this type exists to prevent, or at least warn, the programmer from treating the value at the pointed-to address as the wrong type. This is not only because of data sizes, but the interpretation of data. A 32-bit value interpreted as an int is completely different from a 32-bit value interpreted as a float, for example. You wouldn't want to perform integer arithmetic on something that's supposed to be a float; the result would be garbage.

In addition, although in theory memory is byte-addressable, many architectures impose alignment restrictions on values larger than a byte. For example, the CPU may require that 32-bit values (ints or floats) must be aligned to an address that's a multiple of 4 bytes. If you add 1 to an int* address and try to access the result, it may cause performance issues (the CPU may have to load 2 32-bit values and reassemble parts of them to form the misaligned 32-bit value) or a fault (the CPU may refuse to load a non-aligned address), which could be a silent failure or may cause your program to be forcefully terminated. Therefore, typed pointers like short* and int* may not be entirely an artifact that only exists in the compiler; it may not actually be legal to add a non-aligned value to an int*, depending on the hardware you're running on.

Because of this, C and D implement pointer arithmetic in terms of the underlying value type. I.e., adding 1 to a char* will add 1 to the underlying address, but adding 1 to an int* will add int.sizeof to the underlying address instead of 1. I.e.:

        int[2] x;
        int* p = &x[0];     // let's say this is address 1234
        p++;            // p is now 1238, *not* 1235 (int.sizeof == 4)

As a consequence, when you cast a raw pointer value to a typed pointer, you are responsible to respect any underlying alignment requirements that the machine may have. Casting a non-aligned address like 1235 to a possibly-aligned pointer like int* may cause problems if you're not careful. Also, the value type of the pointer *does* matter; you will get different results depending on the size of the type and any alignment requirements it may have. Pointer arithmetic involving T* operate in units of T.sizeof, *not* in terms of the raw pointer value.


T

Wow! Seriously, thanks a lot for this detailed explanation! I want to write a compiler and this type of explanations that not only give me the answer but explain me in detail why something happens are a gift for me! I wish I could meet you in person and buy you a coffee. Maybe one day, you never know! Thanks a lot and have an amazing day!

Reply via email to