If you want to do the code-page conversion on the JS side then I think the 
easiest option is to call a JS function which directly reads the bytes from 
the global HEAPU8 array view (which is the unsigned-byte-view into the 
emscripten C heap).

A pointer on the C side is simply a 32-bit index into the HEAPU8 array.

I'm doing something similar here:

https://github.com/floooh/sokol/blob/441f952b36f67ce446f1f21c22dcc344b7f21ed8/sokol_audio.h#L1261

...except that instead of accessing the unsigned-byte view I'm accessing 
the float-view (HEAPF32) to copy audio samples from the emscripten heap 
into a WebAudio buffer.

This way at least you have fewer things that can go wrong, and can be quite 
sure that the values you're reading are between 0 and 255 (I'm not sure why 
you'd be getting out-of-bounds values from the getValue function, if this 
is a bug it might be worth writing an emscripten ticket)

Cheers,
-Floh.

On Saturday, 9 March 2019 18:30:39 UTC+1, Juergen Wothke wrote:
>
> I don't see why anyone would want to go back into the stone age and fiddle 
> with legacy C code memory management and non existing String support when I 
> can handle that stuff easily on the JavaScript side.. provided some 
> emscripten API lets me access the respective raw data without fucking it up 
> beyond recognition (as Pointer_stringify() or UTF8ToString() do).
>
> As I mentioned above this.Module.getValue(ptr++, 'i8', true); already 
> seems to be a suitable API to deal with this scenario (the only problem is 
> to find it)!
>
> From what you said the text that is currently in the above docs is 
> outdated anyway, see:
>
>>
>> "Strings in JavaScript must be converted to pointers for compiled code – 
>> the relevant function is Pointer_stringify(), which given a pointer 
>> returns a JavaScript string"
>
>
>
> So when that doc is updated it would be a good idea to add some extra info 
> for those people that DON'T HAVE UTF-8 input.
> Explain how to use getValue(), e.g.
>       -s EXTRA_EXPORTED_RUNTIME_METHODS="['getValue']"
>
>
> PS: I still don't understand why an "i8" can be > 0xff !
>
> Cheers,
> Jürgen
>
>
> Am Montag, 4. März 2019 14:44:27 UTC+1 schrieb Floh:
>>
>> AFAIK Pointer_stringify() has been deprecated in favour of a function 
>> called UTF8ToString() which takes an UTF8-encoded string in the emscripten 
>> HEAP and returns a JS string, maybe the docs haven't been updated yet. But 
>> I think (but may be wrong) it's just a renaming, and that 
>> Pointer_stringify() could deal with UTF-8 string before already.
>>
>> Since ASCII is a subset of UTF8, this would also works for proper (7-bit) 
>> ASCII strings.
>>
>> 8-bit characters with code page encoding is a different topic though, 
>> since code pages are pretty much legacy, and completely unknown in the web 
>> world I would personally prefer to not have extra code-page-aware string 
>> functions in the emscripten API. Instead I would convert the strings on the 
>> C side first from a specific code page encoding into generic UTF-8 before 
>> handing them over to JS.
>>
>> Cheers,
>> -Floh.
>>
>> On Monday, 4 March 2019 12:43:33 UTC+1, Juergen Wothke wrote:
>>>
>>> I often have the situation (e.g. see 
>>> http://www.wothke.ch/playmod/?file=/modules/Ad%20Lib/AMusic/Admiral/mein%20erster%20versuch%20!!!.amd)
>>>  
>>> that some legacy C program delivers some char* based String and that 
>>> original char buffer may be using all kinds of weird character encoding 
>>> schemes (ASCII, codepage 437, whatever..).
>>>
>>> What all these text buffers have in common is that Pointer_stringify is 
>>> completely unsuitable to deal with them. And yet Pointer_stringify seems to 
>>> be the
>>> ONLY API properly advertised in the emscripten docs (see 
>>> https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html
>>> ).
>>>
>>> Eventhough there actually seem to be undocumented functions available 
>>> (like AsciiToString, UTF8ToString, UTF16ToString, etc?) that might 
>>> actually be useful - at least in some of those
>>> scenarios - many people are probably unaware that they exist. At one 
>>> point I had actually started to base64 encode my texts just so that I would 
>>> be able to retrieve the original uncorrupted data on 
>>> the JavaScript side ... which is just riddiculous..
>>>
>>> The last hack I used for codepage 437 encoded strings looked like this;
>>>
>>>  this.codeMap= [ // codepage 437 used by PC DOS and MS-DOS
>>>                    ....
>>>                 ];
>>>
>>>  cp437ToString: function(ptr) { // Pointer_stringify replacement: msdos 
>>> text to unicode.. 
>>>    var str = '';
>>>    while (1) {
>>>  var ch = this.Module.getValue(ptr++, 'i8', true);
>>>  if (!ch) return str;
>>>  str += String.fromCharCode(this.codeMap[ch& 0xff]);
>>>    }
>>>  },
>>>
>>>
>>>
>>> Either I just missed the relevant docs for emscripten functions that 
>>> would be useful in these kinds of scenarios - in which case the docs should 
>>> maybe be impoved. Or if 
>>> the functionality is actually not there then I wonder why - since I can 
>>> hardly be the only person dealing with this kind of scenario.
>>>
>>> PS: I am also surprised by the Module.getValue(ptr++, 'i8', true); 
>>> function: 
>>> 'i8' seems to suggest that I should be getting a 8-bit integer and yet the 
>>> returned values are sometimes bigger than 0xff! ??
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to