On Tue, 12 Jul 2011 13:00:41 -0400, Regan Heath <[email protected]>
wrote:
On Tue, 12 Jul 2011 17:09:04 +0100, Steven Schveighoffer
<[email protected]> wrote:
On Tue, 12 Jul 2011 11:41:56 -0400, Regan Heath <[email protected]>
wrote:
On Tue, 12 Jul 2011 15:59:58 +0100, Steven Schveighoffer
<[email protected]> wrote:
On Tue, 12 Jul 2011 10:50:07 -0400, Regan Heath <[email protected]>
wrote:
What if you expect the function is expecting to write to the
buffer, and the compiler just made a copy of it? Won't that be
pretty surprising?
Assuming a C function in this form:
void write_to_buffer(char *buffer, int length);
No, assuming C function in this form:
void ucase(char* str);
Essentially, a C function which takes a writable
already-null-terminated string, and writes to it.
Ok, that's an even better example for my case.
It would be used/called like...
char[] foo;
.. code which populates foo with something ..
ucase(foo);
and in D today this would corrupt memory. Unless the programmer
remembered to write:
No, it wouldn't compile. char[] does not cast implicitly to char *.
(if it does, that needs to change).
Replace foo with foo.ptr, it makes no difference to the point I was
making.
You fix does not help in that case, foo.ptr will be passed as a non-null
terminated string.
So, your proposal fixes the case:
1. The user tries to pass a string/char[] to a C function. Fails to
compile.
2. Instead of trying to understand the issue, realizes the .ptr member is
the right type, and switches to that.
It does not fix or help with cases where:
* a programmer notices the type of the parameter is char * and uses
foo.ptr without trying foo first. (crash)
* a programmer calls toStringz without going through the compile/fix
cycle above.
* a programmer tries to pass string/char[], fails to compile, then looks
up how to interface with C and finds toStringz
I think this fix really doesn't solve a very common problem.
I am assuming also that if this idea were implemented it would handle
things intelligently, like for example if when toStringz is called the
underlying array is out of room and needs to be reallocated, the
compiler would update the slice/reference 'foo' in the same way as it
already does for an append which triggers a reallocation.
OK, but what if it's like this:
char[] foo = new char[100];
auto bar = foo;
ucase(foo);
In most cases, bar is also written to, but in some cases only foo is
written to.
Granted, we're getting further out on the hypothetical limb here :)
But my point is, making it require explicit calling of toStringz
instead of implicit makes the code less confusing, because you
understand "oh, toStringz may reallocate, so I can't expect bar to also
get updated" vs. simply calling a function with a buffer.
This is not a 'new' problem introduced the idea, it's a general problem
for D/arrays/slices and the same happens with an append, right? In
which case it's not a reason against the idea.
It's new to the features of the C function being called. If you look up
the man page for such a hypothetical function, it might claim that it
alters the data passed in through the argument, but it seems to not be the
case! So there's no way for someone (who arguably is not well versed in C
functions if they didn't know to use toStringz) to figure out why the code
seems not to do what it says it should. Such a programmer may blame
either the implementation of the C function, or blame the D compiler for
not calling the function properly.
You might initially extern it as:
extern "C" void write_to_buffer(char *buffer, int length);
And, you could call it one of 2 ways (legitimately):
char[] foo = new char[100];
write_to_buffer(foo, foo.length);
or:
char[100] foo;
write_to_buffer(foo, foo.length);
and in both cases, toStringz would do nothing as foo is zero
terminated already (in both cases), or am I wrong about that?
In neither case are they required to be null terminated.
True, but I was outlining the worst case scenario for my suggestion,
not describing the real C function requirements.
No, I mean you were wrong, D does not guarantee either of those (stack
allocated or heap allocated) is null terminated. So toStringz must add
a '\0' at the end (which is mildly expensive for heap data, and very
expensive for stack data).
Ah, ok, this was because I had forgotten char is initialised to 0xFF.
If it was initialised to \0 then both arrays would have been full of
null terminators. The default value of char is the killing blow to the
idea.
toStringz does not currently check for '\0' anywhere in the existing
string. It simply appends '\0' to the end of the passed string. If you
want it to check for '\0', how far should it go? Doesn't this also add to
the overhead (looping over all chars looking for '\0')?
Note also, that toStringz has old code that used to check for "one byte
beyond" the array, but this is commented out, because it's unreliable
(could cause a segfault).
The only thing that guarantees null termination is a string literal.
string literals /and/ calling toStringz.
Even "abc".dup is not going to be guaranteed to be null terminated.
For an actual example, try "012345678901234".dup. This should have a
0x0f right after the last character.
Why 0x0f? Does the allocator initialise array memory to it's offset
from the start of the block or something?
The final byte of the block is used as the hidden array length (in this
case 15).
Good to know.
Just for history trivia, it used to be there as an unallocated byte.
Which means it likely had random data in it. It was there to prevent
cross-block pointers. If the byte was part of the array, then it would be
possible to do:
arr1 = arr[$..$];
and now, arr1 points at the *next* block!
arr1 ~= 5;
and now, arr1 may have stomped over possibly unallocated data, or possibly
some already allocated data!
So it was a nice bonus that the byte I commandeered for storing the array
length was already unused :)
-Steve