AFAIK Pointer_stringify() has been deprecated in favour of a function called UTF8ToString() which takes an UTF8-encoded string in the emscripten HEAP and returns a JS string, maybe the docs haven't been updated yet. But I think (but may be wrong) it's just a renaming, and that Pointer_stringify() could deal with UTF-8 string before already.
Since ASCII is a subset of UTF8, this would also works for proper (7-bit) ASCII strings. 8-bit characters with code page encoding is a different topic though, since code pages are pretty much legacy, and completely unknown in the web world I would personally prefer to not have extra code-page-aware string functions in the emscripten API. Instead I would convert the strings on the C side first from a specific code page encoding into generic UTF-8 before handing them over to JS. Cheers, -Floh. On Monday, 4 March 2019 12:43:33 UTC+1, Juergen Wothke wrote: > > I often have the situation (e.g. see > http://www.wothke.ch/playmod/?file=/modules/Ad%20Lib/AMusic/Admiral/mein%20erster%20versuch%20!!!.amd) > > that some legacy C program delivers some char* based String and that > original char buffer may be using all kinds of weird character encoding > schemes (ASCII, codepage 437, whatever..). > > What all these text buffers have in common is that Pointer_stringify is > completely unsuitable to deal with them. And yet Pointer_stringify seems to > be the > ONLY API properly advertised in the emscripten docs (see > https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html > ). > > Eventhough there actually seem to be undocumented functions available > (like AsciiToString, UTF8ToString, UTF16ToString, etc?) that might > actually be useful - at least in some of those > scenarios - many people are probably unaware that they exist. At one point > I had actually started to base64 encode my texts just so that I would be > able to retrieve the original uncorrupted data on > the JavaScript side ... which is just riddiculous.. > > The last hack I used for codepage 437 encoded strings looked like this; > > this.codeMap= [ // codepage 437 used by PC DOS and MS-DOS > .... > ]; > > cp437ToString: function(ptr) { // Pointer_stringify replacement: msdos > text to unicode.. > var str = ''; > while (1) { > var ch = this.Module.getValue(ptr++, 'i8', true); > if (!ch) return str; > str += String.fromCharCode(this.codeMap[ch& 0xff]); > } > }, > > > > Either I just missed the relevant docs for emscripten functions that would > be useful in these kinds of scenarios - in which case the docs should maybe > be impoved. Or if > the functionality is actually not there then I wonder why - since I can > hardly be the only person dealing with this kind of scenario. > > PS: I am also surprised by the Module.getValue(ptr++, 'i8', true); function: > 'i8' seems to suggest that I should be getting a 8-bit integer and yet the > returned values are sometimes bigger than 0xff! ?? > -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
