AFAIK Pointer_stringify() has been deprecated in favour of a function 
called UTF8ToString() which takes an UTF8-encoded string in the emscripten 
HEAP and returns a JS string, maybe the docs haven't been updated yet. But 
I think (but may be wrong) it's just a renaming, and that 
Pointer_stringify() could deal with UTF-8 string before already.

Since ASCII is a subset of UTF8, this would also works for proper (7-bit) 
ASCII strings.

8-bit characters with code page encoding is a different topic though, since 
code pages are pretty much legacy, and completely unknown in the web world 
I would personally prefer to not have extra code-page-aware string 
functions in the emscripten API. Instead I would convert the strings on the 
C side first from a specific code page encoding into generic UTF-8 before 
handing them over to JS.

Cheers,
-Floh.

On Monday, 4 March 2019 12:43:33 UTC+1, Juergen Wothke wrote:
>
> I often have the situation (e.g. see 
> http://www.wothke.ch/playmod/?file=/modules/Ad%20Lib/AMusic/Admiral/mein%20erster%20versuch%20!!!.amd)
>  
> that some legacy C program delivers some char* based String and that 
> original char buffer may be using all kinds of weird character encoding 
> schemes (ASCII, codepage 437, whatever..).
>
> What all these text buffers have in common is that Pointer_stringify is 
> completely unsuitable to deal with them. And yet Pointer_stringify seems to 
> be the
> ONLY API properly advertised in the emscripten docs (see 
> https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html
> ).
>
> Eventhough there actually seem to be undocumented functions available 
> (like AsciiToString, UTF8ToString, UTF16ToString, etc?) that might 
> actually be useful - at least in some of those
> scenarios - many people are probably unaware that they exist. At one point 
> I had actually started to base64 encode my texts just so that I would be 
> able to retrieve the original uncorrupted data on 
> the JavaScript side ... which is just riddiculous..
>
> The last hack I used for codepage 437 encoded strings looked like this;
>
>  this.codeMap= [ // codepage 437 used by PC DOS and MS-DOS
>                    ....
>                 ];
>
>  cp437ToString: function(ptr) { // Pointer_stringify replacement: msdos 
> text to unicode.. 
>    var str = '';
>    while (1) {
>  var ch = this.Module.getValue(ptr++, 'i8', true);
>  if (!ch) return str;
>  str += String.fromCharCode(this.codeMap[ch& 0xff]);
>    }
>  },
>
>
>
> Either I just missed the relevant docs for emscripten functions that would 
> be useful in these kinds of scenarios - in which case the docs should maybe 
> be impoved. Or if 
> the functionality is actually not there then I wonder why - since I can 
> hardly be the only person dealing with this kind of scenario.
>
> PS: I am also surprised by the Module.getValue(ptr++, 'i8', true); function: 
> 'i8' seems to suggest that I should be getting a 8-bit integer and yet the 
> returned values are sometimes bigger than 0xff! ??
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to