I don't see why anyone would want to go back into the stone age and fiddle 
with legacy C code memory management and non existing String support when I 
can handle that stuff easily on the JavaScript side.. provided some 
emscripten API lets me access the respective raw data without fucking it up 
beyond recognition (as Pointer_stringify() or UTF8ToString() do).

As I mentioned above this.Module.getValue(ptr++, 'i8', true); already seems 
to be a suitable API to deal with this scenario (the only problem is to 
find it)!

>From what you said the text that is currently in the above docs is outdated 
anyway, see:

>
> "Strings in JavaScript must be converted to pointers for compiled code – 
> the relevant function is Pointer_stringify(), which given a pointer 
> returns a JavaScript string"



So when that doc is updated it would be a good idea to add some extra info 
for those people that DON'T HAVE UTF-8 input.
Explain how to use getValue(), e.g.
      -s EXTRA_EXPORTED_RUNTIME_METHODS="['getValue']"


PS: I still don't understand why an "i8" can be > 0xff !

Cheers,
Jürgen


Am Montag, 4. März 2019 14:44:27 UTC+1 schrieb Floh:
>
> AFAIK Pointer_stringify() has been deprecated in favour of a function 
> called UTF8ToString() which takes an UTF8-encoded string in the emscripten 
> HEAP and returns a JS string, maybe the docs haven't been updated yet. But 
> I think (but may be wrong) it's just a renaming, and that 
> Pointer_stringify() could deal with UTF-8 string before already.
>
> Since ASCII is a subset of UTF8, this would also works for proper (7-bit) 
> ASCII strings.
>
> 8-bit characters with code page encoding is a different topic though, 
> since code pages are pretty much legacy, and completely unknown in the web 
> world I would personally prefer to not have extra code-page-aware string 
> functions in the emscripten API. Instead I would convert the strings on the 
> C side first from a specific code page encoding into generic UTF-8 before 
> handing them over to JS.
>
> Cheers,
> -Floh.
>
> On Monday, 4 March 2019 12:43:33 UTC+1, Juergen Wothke wrote:
>>
>> I often have the situation (e.g. see 
>> http://www.wothke.ch/playmod/?file=/modules/Ad%20Lib/AMusic/Admiral/mein%20erster%20versuch%20!!!.amd)
>>  
>> that some legacy C program delivers some char* based String and that 
>> original char buffer may be using all kinds of weird character encoding 
>> schemes (ASCII, codepage 437, whatever..).
>>
>> What all these text buffers have in common is that Pointer_stringify is 
>> completely unsuitable to deal with them. And yet Pointer_stringify seems to 
>> be the
>> ONLY API properly advertised in the emscripten docs (see 
>> https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html
>> ).
>>
>> Eventhough there actually seem to be undocumented functions available 
>> (like AsciiToString, UTF8ToString, UTF16ToString, etc?) that might 
>> actually be useful - at least in some of those
>> scenarios - many people are probably unaware that they exist. At one 
>> point I had actually started to base64 encode my texts just so that I would 
>> be able to retrieve the original uncorrupted data on 
>> the JavaScript side ... which is just riddiculous..
>>
>> The last hack I used for codepage 437 encoded strings looked like this;
>>
>>  this.codeMap= [ // codepage 437 used by PC DOS and MS-DOS
>>                    ....
>>                 ];
>>
>>  cp437ToString: function(ptr) { // Pointer_stringify replacement: msdos 
>> text to unicode.. 
>>    var str = '';
>>    while (1) {
>>  var ch = this.Module.getValue(ptr++, 'i8', true);
>>  if (!ch) return str;
>>  str += String.fromCharCode(this.codeMap[ch& 0xff]);
>>    }
>>  },
>>
>>
>>
>> Either I just missed the relevant docs for emscripten functions that 
>> would be useful in these kinds of scenarios - in which case the docs should 
>> maybe be impoved. Or if 
>> the functionality is actually not there then I wonder why - since I can 
>> hardly be the only person dealing with this kind of scenario.
>>
>> PS: I am also surprised by the Module.getValue(ptr++, 'i8', true); function: 
>> 'i8' seems to suggest that I should be getting a 8-bit integer and yet the 
>> returned values are sometimes bigger than 0xff! ??
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to