On Sunday, 4 December 2022 at 19:00:15 UTC, H. S. Teoh wrote:
This is true only if you're talking about pointers in the sense
of pointers in assembly language. Languages like C and D add
another layer of abstraction over this.
Another thing with pointers is that it doesn't have "types".
This is where you went wrong. In assembly language, yes, a
pointer value is just a number, and there's no type associated
with it. However, experience has shown that manipulating
pointers at this raw, untyped level is extremely error-prone.
Therefore, in languages like C or D, a pointer *does* have a
type. It's a way of preventing the programmer from making
silly mistakes, by associating a type (at compile-time only, of
course) to the pointer value. It's a way of keeping track that
address 1234 points to a short, and not to a float, for
example. At the assembly level, of course, this type
information is erased, and the pointers are just integer
addresses. However, at compile-type, this type exists to
prevent, or at least warn, the programmer from treating the
value at the pointed-to address as the wrong type. This is not
only because of data sizes, but the interpretation of data. A
32-bit value interpreted as an int is completely different from
a 32-bit value interpreted as a float, for example. You
wouldn't want to perform integer arithmetic on something that's
supposed to be a float; the result would be garbage.
In addition, although in theory memory is byte-addressable,
many architectures impose alignment restrictions on values
larger than a byte. For example, the CPU may require that
32-bit values (ints or floats) must be aligned to an address
that's a multiple of 4 bytes. If you add 1 to an int* address
and try to access the result, it may cause performance issues
(the CPU may have to load 2 32-bit values and reassemble parts
of them to form the misaligned 32-bit value) or a fault (the
CPU may refuse to load a non-aligned address), which could be a
silent failure or may cause your program to be forcefully
terminated. Therefore, typed pointers like short* and int* may
not be entirely an artifact that only exists in the compiler;
it may not actually be legal to add a non-aligned value to an
int*, depending on the hardware you're running on.
Because of this, C and D implement pointer arithmetic in terms
of the underlying value type. I.e., adding 1 to a char* will
add 1 to the underlying address, but adding 1 to an int* will
add int.sizeof to the underlying address instead of 1. I.e.:
int[2] x;
int* p = &x[0]; // let's say this is address 1234
p++; // p is now 1238, *not* 1235 (int.sizeof == 4)
As a consequence, when you cast a raw pointer value to a typed
pointer, you are responsible to respect any underlying
alignment requirements that the machine may have. Casting a
non-aligned address like 1235 to a possibly-aligned pointer
like int* may cause problems if you're not careful. Also, the
value type of the pointer *does* matter; you will get different
results depending on the size of the type and any alignment
requirements it may have. Pointer arithmetic involving T*
operate in units of T.sizeof, *not* in terms of the raw pointer
value.
T
Wow! Seriously, thanks a lot for this detailed explanation! I
want to write a compiler and this type of explanations that not
only give me the answer but explain me in detail why something
happens are a gift for me! I wish I could meet you in person and
buy you a coffee. Maybe one day, you never know! Thanks a lot and
have an amazing day!