Re: D on lm32-CPU: string argument on stack instead of register

Chad Joan via Digitalmars-d-learn Fri, 31 Jul 2020 08:16:16 -0700

On Friday, 31 July 2020 at 10:22:20 UTC, Michael Reese wrote:

Hi all,
at work we put embedded lm32 soft-core CPUs in FPGAs and writethe firmware in C.At home I enjoy writing small projects in D from time to time,but I don't consider myself a D expert.
Now, I'm trying to run some toy examples in D on the lm32 cpu.I'm using a recent gcc-elf-lm32. I succeeded in compiling andrunning some code and it works fine.
But I noticed, when calling a function with a string argument,the string is not stored in registers, but on the stack.Consider a simple function (below) that writes bytes to aperipheral (that forwards the data to the host computer viaUSB). I've two versions, an ideomatic D one, and anotherversion where pointer and length are two distinct functionparameters.I also show the generated assembly code. The string version is4 instructions longer, just because of the stack manipulation.In addition, it is also slower because it need to access theram, and it needs more stack space.
My question: Is there a way I can tell the D compiler to useregisters instead of stack for string arguments, or any othertrick to reduce code size while maintaining an ideomatic Dcodestyle?
Best regards
Michael


// ideomatic D version
void write_to_host(in string msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        foreach(ch; msg) {
                *usb_slave = ch;
        }
}
// resulting assembly code (compiled with -Os) 12 instructions
_D10firmware_d13write_to_hostFxAyaZv:
        addi     sp, sp, -8
        addi     r3, r0, 4096
        sw       (sp+4), r1
        sw       (sp+8), r2
        add      r1, r2, r1
.L3:
        be     r2,r1,.L1
        lbu      r4, (r2+0)
        addi     r2, r2, 1
        sb       (r3+0), r4
        bi       .L3
.L1:
        addi     sp, sp, 8
        b        ra

// C-like version
void write_to_hostC(const char *msg, int len) {
        char *ptr = cast(char*)msg;
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        while (len--) {
                *usb_slave = *ptr++;
        }
}
// resulting assembly code (compiled with -Os) 8 instructions
_D10firmware_d14write_to_hostCFxPaiZv:
        add      r2, r1, r2
        addi     r3, r0, 4096
.L7:
        be     r1,r2,.L5
        lbu      r4, (r1+0)
        addi     r1, r1, 1
        sb       (r3+0), r4
        bi       .L7
.L5:
        b        ra


Hi Michael!

Last time I checked, D doesn't have any specific type attributesor special ways to force variables to enregister. But I could bepoorly informed. Maybe there are GDC-specific hints or something.I hope that if anyone else knows better, they will toss in ananswer.

THAT SAID, I think there are things to try and I hope we can getyou what you want.

If you're willing to entertain more experimentation, here are mythoughts:


---------------------------------------
(1) Try writing "in string" as "in const(char)[]" instead:

// ideomatic D version
void write_to_host(in const(char)[] msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        foreach(ch; msg) {
                *usb_slave = ch;
        }
}

Explanation:

The "string" type is an alias for "immutable(char)[]".

In D, "immutable" is a stronger guarantee than "const". The"const" modifier, like in C, tells the compiler that thisfunction shall not modify the data referenced by thispointer/array/whatever. The "immutable" modifier is a bitdifferent, as it says that NO ONE will modify the data referencedby this pointer/array/whatever, including other functions thatmay or may not be concurrently executing alongside the one you'rein. So "const" constraints the callee, while "immutable"constrains both the callee AND the caller. This makes it moreuseful for some multithreaded code, because if you can accept thepotential inefficiency of needing to do more copying of data (ifyou can't modify, usually you must copy instead), then you canhave more deterministic behavior and sometimes even much bettertotal efficiency by way of parallelization. This might not be aguarantee you care about though, at which point you can just tossit out completely and see if the compiler generates better codenow that it sees the same type qualifier as in the other example.

I'd actually be surprised if using "immutable" causes /less/efficient code in this case, because it should be even /safer/ touse the argument as-is. But it IS a difference between the twoexamples, and one that might not be benefiting your cause (thoughthat's totally up to you).


---------------------------------------

(2) Try keeping the string argument, but make the function moreclosely identical in semantics:


// ideomatic D version
void write_to_host(string msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        while(msg.length > 0) {
                *usb_slave = msg[0];
                msg = msg[1 .. $];
        }
}

Explanation:

First of all, I wouldn't expect you to keep this, especially ifyou need utf-8 autodecoding behavior (more on that later). But itmight be revealing if this leads to different assembly output.

The idea behind this one is to see if the regression is actuallycaused by the foreach construct, rather than the parameter type.I did have to change the parameter slightly by removing the "in"qualifier. It shouldn't make much difference though, because the'string' type's pointer and length are copied from the caller, soany modifications to "msg" (that don't affect "msg"'s arrayelements) will be contained within the function and will not beobservable anywhere else. In other words, the "in" qualifier islargely redundant with "string"'s immutability guarantees plusfunction argument copying semantics.


---------------------------------------
(3) Try a different type of while-loop in the D-style version:

// ideomatic D version
void write_to_host(in string msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        size_t i = 0;
        while(i < msg.length) {
                *usb_slave = msg[i++];
        }
}

Explanation:

This is a variant of #2. It does ask for an extra size_tvariable, so I don't have high hopes. But the compiler mightoptimize that out and make it look like the C-style version.Again, I don't expect you to use this version if it discards oneof D's features that you hope to use, but it might at least helpyou identify where your expenses are coming from.


---------------------------------------

(4) Try having these examples use "const ubyte* msg" and"immutable(ubyte)[] msg" instead of "const char* msg" and "stringmsg".


// ideomatic D version
void write_to_host(in immutable(ubyte)[] msg) {
        // a fixed address to get bytes to the host via usb
        ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
        foreach(ch; msg) {
                *usb_slave = ch;
        }
}

// C-like version
void write_to_hostC(const ubyte *msg, int len) {
        ubyte *ptr = cast(ubyte*)msg;
        ubyte *usb_slave = cast(ubyte*)BaseAdr.ft232_slave;
        while (len--) {
                *usb_slave = *ptr++;
        }
}

Explanation:

The "string" type is an alias for "immutable(char)[]", whichseems like it would be very similar to "immutable(ubyte)[]", butthe 'char' element type communicates a requirement that the'ubyte' element type does not: utf-8 awareness. And that can havea cost.

In D, char[] arrays are defined as containing utf-8 text. This israther different from C, where the 'char' type is more like D's'byte' or 'ubyte' types and just happens to also be used to storetext data in any encoding the author feels like. When I see"foreach(ch; msg)" and msg's element type is "char", then Iexpect "ch" to be of type 'dchar' (instead of 'char') and Iexpect the foreach loop to auto-decode the utf-8 text in thestring (or immutable(char)[]) type into whole unicode codepointsthat are then placed into the 'dchar'. If you are only dealingwith ASCII text (or any 8-bit-or-less encoding that isn't utf-8),then you may just want to use the 'byte' or 'ubyte' typesinstead. In everyday D, this changes the semantics of the foreachloop, because no autodecoding is done on types like byte[] orubyte[], and it may "behave" (from an implementor perspective)more like the while-loop in your second example.

You probably won't see a lot of text-processing through byte[] orubyte[] in normal D code, but that's because most programmerswill want their programs to be able to process utf-8 text, whilein the embedded programming space you might not have to worryabout utf-8 at all.

Now, I actually didn't see any autodecoding of utf-8 in theassembly you posted. Maybe I could be wrong though; I am notexperienced in lm32 assembly. Nonetheless, I'd expect to seemsome sort of conditional call or, at the very least, some kind ofmasking of the highest bit of every char (to detect utf-8sequences). Maybe it's a bug in your (cross?) compiler, or evenjust an intentional configuration choice that I didn't expect. Atany rate, I don't think your code is larger or less efficient dueto utf-8 decoding, because I don't see the utf-8 decoding.

Still, I'm curious to see if changing up the types causes thecompiler to choose different codepaths for its codegen, even forinane reasons. Maybe the autodecoding is turned off, but it stillthinks it needs to allocate extra space for the autodecoder's"dchar" or something, and then that exceeds some threshold forpassing enregistered arguments. Maybe for similar reasons itthinks it needs to keep a copy of that string around. Compilersare mysterious beasts sometimes. *shrug*


---------------------------------------

(5) And for maximum curiousity, what happens if you write theC-like version this way instead?


// C-like version
// msg parameter change: "const char *msg" -> "const(char)* msg"
void write_to_hostC(const(char)* msg, int len) {
        // cast() statement removed.
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        while (len--) {
                *usb_slave = *msg++;
        }
}

Explanation:

I realize the difference is subtle, but "const char *msg" saysthat both the pointed-to chars can't be modified and also thatthe /pointer itself/ cannot be modified. In the other case, with"const(char)* msg", the constraint is looser but still veryuseful: the pointed-to chars can't be modified, but the pointercan be modified. Because the pointer (but not the referred data)is a copy of the caller's pointer, any modifications to thepointer (increments and such) are only visible within the scopeof this function.

The C-like version is already the more optimal one, but if makingthis change causes it to regress to generating assembly similarto the D-like version, then it might suggest that the additionalassignment statement is actually helpful somehow. It'd beunintuitive, but you never know.


---------------------------------------

(6) OK sorry, one more. Because #5 made me think: what if weextended the D-idiomatic-version's immutability guarantee to thewhole array value and not just the array elements?


// ideomatic D version
void write_to_host(immutable(char[]) msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        foreach(ch; msg) {
                *usb_slave = ch;
        }
}

And to make it even more like the C-style version without beingC-style, it might also be worth stacking it with theimmutable->const change:


// ideomatic D version
void write_to_host(const(char[]) msg) {
        // a fixed address to get bytes to the host via usb
        char *usb_slave = cast(char*)BaseAdr.ft232_slave;
        foreach(ch; msg) {
                *usb_slave = ch;
        }
}

After all, if the original C-style version isn't allowed tochange its argument's pointer, then we could try making theD-idiomatic version behave that way too, and see if this minoralteration makes the difference.


---------------------------------------

Just to be safe, I also want to point out the difference between"char *ptr" and "char* ptr": in a single-variable declaration,there is none, but if there is more than one, the pointer bindsmore strongly to the type than to the variable in D.


Consider a declaration like:

char* str0, str1;

In C, this would make str0 a pointer, and str1 a char.
In D, this means that both str0 and str1 are pointers.

Thus, in D, it is more conventional to write the * character nextto the type than it is to write it next to thevariable/identifier. This reinforces the notion that pointer-nessis (syntactically) part of the type, rather than part of thevariables.


There's a similar example in this article:
https://dlang.org/blog/2018/10/17/interfacing-d-with-c-arrays-part-1/

If you already knew that, don't mind me. I realize that a lot ofC code gets copied into D without changing this thing, and unlessthere are multiple variables in the same declaration, it reallydoesn't matter.




Good luck with your lm32/FPGA coding. That sounds like cool stuff!

Re: D on lm32-CPU: string argument on stack instead of register

Reply via email to