I don't see why anyone would want to go back into the stone age and fiddle
with legacy C code memory management and non existing String support when I
can handle that stuff easily on the JavaScript side.. provided some
emscripten API lets me access the respective raw data without fucking it up
beyond recognition (as Pointer_stringify() or UTF8ToString() do).
As I mentioned above this.Module.getValue(ptr++, 'i8', true); already seems
to be a suitable API to deal with this scenario (the only problem is to
find it)!
>From what you said the text that is currently in the above docs is outdated
anyway, see:
>
> "Strings in JavaScript must be converted to pointers for compiled code –
> the relevant function is Pointer_stringify(), which given a pointer
> returns a JavaScript string"
So when that doc is updated it would be a good idea to add some extra info
for those people that DON'T HAVE UTF-8 input.
Explain how to use getValue(), e.g.
-s EXTRA_EXPORTED_RUNTIME_METHODS="['getValue']"
PS: I still don't understand why an "i8" can be > 0xff !
Cheers,
Jürgen
Am Montag, 4. März 2019 14:44:27 UTC+1 schrieb Floh:
>
> AFAIK Pointer_stringify() has been deprecated in favour of a function
> called UTF8ToString() which takes an UTF8-encoded string in the emscripten
> HEAP and returns a JS string, maybe the docs haven't been updated yet. But
> I think (but may be wrong) it's just a renaming, and that
> Pointer_stringify() could deal with UTF-8 string before already.
>
> Since ASCII is a subset of UTF8, this would also works for proper (7-bit)
> ASCII strings.
>
> 8-bit characters with code page encoding is a different topic though,
> since code pages are pretty much legacy, and completely unknown in the web
> world I would personally prefer to not have extra code-page-aware string
> functions in the emscripten API. Instead I would convert the strings on the
> C side first from a specific code page encoding into generic UTF-8 before
> handing them over to JS.
>
> Cheers,
> -Floh.
>
> On Monday, 4 March 2019 12:43:33 UTC+1, Juergen Wothke wrote:
>>
>> I often have the situation (e.g. see
>> http://www.wothke.ch/playmod/?file=/modules/Ad%20Lib/AMusic/Admiral/mein%20erster%20versuch%20!!!.amd)
>>
>> that some legacy C program delivers some char* based String and that
>> original char buffer may be using all kinds of weird character encoding
>> schemes (ASCII, codepage 437, whatever..).
>>
>> What all these text buffers have in common is that Pointer_stringify is
>> completely unsuitable to deal with them. And yet Pointer_stringify seems to
>> be the
>> ONLY API properly advertised in the emscripten docs (see
>> https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html
>> ).
>>
>> Eventhough there actually seem to be undocumented functions available
>> (like AsciiToString, UTF8ToString, UTF16ToString, etc?) that might
>> actually be useful - at least in some of those
>> scenarios - many people are probably unaware that they exist. At one
>> point I had actually started to base64 encode my texts just so that I would
>> be able to retrieve the original uncorrupted data on
>> the JavaScript side ... which is just riddiculous..
>>
>> The last hack I used for codepage 437 encoded strings looked like this;
>>
>> this.codeMap= [ // codepage 437 used by PC DOS and MS-DOS
>> ....
>> ];
>>
>> cp437ToString: function(ptr) { // Pointer_stringify replacement: msdos
>> text to unicode..
>> var str = '';
>> while (1) {
>> var ch = this.Module.getValue(ptr++, 'i8', true);
>> if (!ch) return str;
>> str += String.fromCharCode(this.codeMap[ch& 0xff]);
>> }
>> },
>>
>>
>>
>> Either I just missed the relevant docs for emscripten functions that
>> would be useful in these kinds of scenarios - in which case the docs should
>> maybe be impoved. Or if
>> the functionality is actually not there then I wonder why - since I can
>> hardly be the only person dealing with this kind of scenario.
>>
>> PS: I am also surprised by the Module.getValue(ptr++, 'i8', true); function:
>> 'i8' seems to suggest that I should be getting a 8-bit integer and yet the
>> returned values are sometimes bigger than 0xff! ??
>>
>
--
You received this message because you are subscribed to the Google Groups
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.