On Saturday, 16 August 2025 at 14:43:25 UTC, H. S. Teoh wrote:
On Sat, Aug 16, 2025 at 11:56:43AM +0000, Brother Bill via
Digitalmars-d-learn wrote:
It is obvious that reading or writing to invalid memory can
result in
"undefined behavior".
But is merely pointing to invalid memory "harmful"?
The documentation states that going one past the last element
of a slice is acceptable.
Where does it say this? This is wrong.
But is it also safe to go 10, 100 or 1000 items past the last
element of a slice?
Of course not.
//
It all depends on the interpretation you're using.
Technically speaking, a pointer is just a memory address. An
unsigned integer. There's nothing inherently "harmful" about
an integer.
The problem arises when you interpret it as a memory address.
Once you interpret it as an address, you're likely to pass it
around to code that expects to be able to read or write to
memory at that address. And that's where the problem arises.
There are expectations placed upon an unsigned integer that's
to be interpreted as an address, such as that you can read
memory from that address. The set of integers that are valid
addresses is a subset of the set of all (representable)
integers.
It's not just about pointing to "invalid memory" either; it's
also about not breaking the expectations of the type system.
When handed a pointer to a string, for example, the expectation
is that when you read memory at that address, you will find a
valid sequence of values that represents a string. If you
treat a random unsigned integer as a poitner to a string, you
may end up reading a sequence of values that *aren't* a string,
thereby obtaining invalid data. Or worse, if you write to that
address, then somebody else (i.e. some other code) that put the
previous data there may try to read it later, expecting valid
data of the previous type, and get instead something that's no
longer a valid value of that type. The set of memory addresses
containing data of the correct type is narrower than the set of
valid addresses (addresses assigned to you by the OS), and the
set of valid addresses is narrower than the set of all
addresses, most of which will trigger an invalid memory access
from your OS because that address wasn't assigned to your
program and the OS will step in to terminate your program if
you try to access it.
//
Now in theory you can allow arbitrary values in your pointers,
and only check for validity when you actually dereference it,
analogous to how, given a street address handed to you on a
piece of paper, you'd check whether that address actually
exists before actually heading out there. In practice, though,
this is impractical, because that means every pointer
dereference your program makes would have to run through some
global registry of valid addresses and check whether data
currently stored there is of the correct type. This would be
extremely slow and the simplest of operations would take
forever to run. (Not to mention the issues of keeping said
global registry up-to-date as the program runs and modifies its
data.)
To eliminate this onerous overhead every time you dereference a
pointer, programming languages make the simplification that
*all* pointers must always contain a valid address of the
correct type (or a special null value, that indicates that
there is no address at all). The idea being that before even
assigning a given integer value to a pointer, you'd ensure that
it was a valid address to begin with, so that by the time you
try to dereference the pointer, you can be confident that it's
a valid address and simply dereference it without further
verification.
This is essentially the whole point of a type system -- to
ensure that a given piece of data is a valid representation of
its intended type, so that you can safely manipulate it.
Doing things like assigning non-pointer values to a pointer
breaks the guarantees that the type system gives you, because
the assumptions made by all those places in your code that
dereferences this pointer are now invalid, and all bets are off
what will happen when you run that code. This is why it's
invalid to point to "invalid" memory. The act of pointing
itself is "harmless" -- since it's just some integer address --
but the harm comes from the broken assumptions of the rest of
the code that assumes that the address contained in the pointer
is a valid address containing data of the expected type.
T
I'm not sure we are on the same page.
My question is whether having a pointer to invalid memory causes
problems.
I am fully aware that reading or writing to an invalid memory
address causes problems.
I am grasping that even having a pointer pointing to an invalid
memory address is a huge code smell. But will it cause any
undefined behavior merely having a pointer with an invalid memory
address.