On Tuesday, 3 February 2015 at 23:55:19 UTC, FG wrote:
On 2015-02-04 at 00:07, Foo wrote:
How would I use decoding for that? Isn't there a way to read the file as utf8 or event better, as unicode?

Well, apparently the utf-8-aware foreach loop still works just fine. This program shows the file size and the number of unicode glyps, or whatever they are called:

    import core.stdc.stdio;
    int main() @nogc
    {
        const int bufSize = 64000;
        char[bufSize] buffer;
        size_t bytesRead, count;
        FILE* f = core.stdc.stdio.fopen("test.d", "r");
        if (!f)
            return 1;
        bytesRead = fread(cast(void*)buffer, 1, bufSize, f);
        if (bytesRead > bufSize - 1) {
            printf("File is too big");
            return 1;
        }
        if (!bytesRead)
            return 2;
        foreach (dchar d; buffer[0..bytesRead])
            count++;
printf("read %d bytes, %d unicode characters\n", bytesRead, count);
        fclose(f);
        return 0;
    }

Outputs for example this: read 838 bytes, 829 unicode characters

(It would be more complicated if it had to process bigger files.)

To use a foreach loop is such a nice idea! tank you very much. :)

That's my code now:
----
private:

static import m3.m3;
static import core.stdc.stdio;
alias printf = core.stdc.stdio.printf;

public:

@trusted
@nogc
auto readFile(in string filename) nothrow {
import std.c.stdio : FILE, SEEK_END, SEEK_SET, fopen, fclose, fseek, ftell, fread;

        FILE* f = fopen(filename.ptr, "rb");
        fseek(f, 0, SEEK_END);
        immutable size_t fsize = ftell(f);
        fseek(f, 0, SEEK_SET);

        char[] str = m3.m3.make!(char[])(fsize);
        fread(str.ptr, fsize, 1, f);
        fclose(f);

        return str;
}

@trusted
@nogc
@property
dstring toUTF32(in char[] s) {
dchar[] r = m3.m3.make!(dchar[])(s.length); // r will never be longer than s
    foreach (immutable size_t i, dchar c; s) {
        r[i] = c;
    }

    return cast(dstring) r;
}

@nogc
void main() {
        auto str = readFile("test_file.txt");
        scope(exit) m3.m3.destruct(str);

        auto str2 = str.toUTF32;
        printf("%d : %d\n", cast(int) str[0], cast(int) str2[0]);
}
----

m3 is my own module and means "manual memory management", three m's so m3. If we will use D (what is now much more likely) that is our core module for memory management.

Reply via email to