Re: Segmentation faults in wasm workers

'Dieter Weidenbrück' via emscripten-discuss Sat, 27 May 2023 08:43:24 -0700

Certainly, I appreciate your interest.

I had to abandon a single-thread solution, because it would block the main 
thread for minutes. 
*Step1: *Using js workers (emscripten_create_worker(wname))
Worked very well. Bonus was to have 2GB of RAM (if available) in each 
worker. Caveat: lots of copying back and forth. No messaging between 
workers.
Step2: Decision for wasm workers (emscripten_malloc_wasm_worker)
The requirement to move to a different kind of workers came with the usage 
of SharedArrayBuffers. I could allocate my data in the main thread, and 
then send of parts of it for processing to a list of workers, without the 
need for copying stuff around.
Not being familiar with pthreads nor wasm workers I followed the 
recommendation on this page:
https://emscripten.org/docs/api_reference/wasm_workers.html?highlight=wasm%20worker


*" If an application is only developed to target WebAssembly, and 
portability is not a concern, then using Wasm Workers can provide great 
benefits in the form of simpler compiled output, less complexity, smaller 
code size and possibly better performance."*
(see section " Pthreads vs Wasm Workers: Which One to Use?")
Other than that I had no particular reason to choose wasm workers, although 
I liked the idea of just a couple of bytes on disk for the wasm workers.

Cheers,
Dieter
s...@google.com schrieb am Freitag, 26. Mai 2023 um 22:47:12 UTC+2:

> Can I ask why you chose not to use pthreads to start with?  I'd like to 
> understand better why folks would choose wasm workers over pthreads.
>
> On Fri, May 26, 2023 at 3:25 AM 'Dieter Weidenbrück' via 
> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>
>> Hi Sam,
>> IIRC, when I started with Emscripten a while ago the program would abort 
>> in case of a memory error. As my app is comparable to a desktop app, this 
>> was not acceptable, so I set ABORTING_MALLOC to 0. I understand that this 
>> flag has a different meaning today. Here is how all my allocation calls 
>> work:
>>
>> Error_T allocMemPtr(MemPtr_T *p,uint32_T size,boolean_T clear) {
>> _MemPtr_T mp;
>>
>> if (clear)
>> mp = (_MemPtr_T)calloc(1,size + sizeof(_Mem_T));
>> else
>> mp = (_MemPtr_T)malloc(size + sizeof(_Mem_T));
>> if (mp) {
>> mp->size = size;
>> *p = (MemPtr_T)((char_T*)mp + sizeof(_Mem_T));
>> return kErr_NoErr;
>> }
>> return kErr_MemErr;
>> }
>> Error_T setMemPtrSize(MemPtr_T *p,uint32_T size){
>> _MemPtr_T m = _MP(*p);
>> MemPtr_T newPtr;
>>
>> newPtr = realloc(m,size + sizeof(_Mem_T));
>> if (newPtr) {
>> m = (_MemPtr_T)newPtr;
>> m->size = size;
>> *p = (MemPtr_T)((char_T*)m + sizeof(_Mem_T)); 
>> return kErr_NoErr;
>> }
>> return kErr_MemErr;
>> }
>>
>> So I should catch all errors. However,  errors (i.e. return value == 0) 
>> are not reported by malloc or calloc during the problems I am experiencing. 
>> I added debug lines, but not a single failure was recorded.
>> Removing ABORTING_MALLOC did not result in any change of error outcome.
>>
>> I see two different behaviors now:
>> - setting  up workers and checking that they run by 
>> static void startUpWorker(void) {
>> #ifdef __EMSCRIPTEN__
>> int32_T w = emscripten_wasm_worker_self_id();
>> if (! emscripten_current_thread_is_wasm_worker()){
>> EM_ASM_({
>> console.log("Error: No worker: " + $0);
>> },w);
>> }
>> #endif //__EMSCRIPTEN__
>> }
>> - then I do my stuff and receive about 10 of the "Uncaught RuntimeError: 
>> memory access out of bounds" errors.
>> - no failures of malloc/calloc recognized
>>
>> The second behavior is 
>> - in main() I call this routine:
>> static void memtest(void) {
>> #define NUM_CHUNKS  15
>> const int CHUNK_SIZE = 100 * 1024 * 1024;
>> int i;
>> void* p[NUM_CHUNKS];
>> Error_T err = kErr_NoErr;
>>
>> for (int i = 0; i < NUM_CHUNKS; i++) {
>> err = allocMemPtr(&p[i],CHUNK_SIZE,FALSE); //see function above
>> if (err != kErr_NoErr || p[i] == NULLPTR) {
>> printf("Error chunk %d\n",i);
>> break;
>> }
>> }
>> for (int i = 0; i < NUM_CHUNKS; i++) {
>> if (p[i] == NULLPTR)
>> break;
>> disposeMemPtr(p[i]);
>> }
>> }
>> - then I start up the workers as described above
>> - then I do my stuff
>> - sometimes this results in error free behavior, but not always. If an 
>> error occurs, I only get one "Uncaught RuntimeError" message.
>>
>> I am pretty confident that I handle memory allocation correctly, because 
>> my background is in development of desktop apps in C for 30+ years, and 
>> there you better not have any leaks and keep the app running whenever 
>> possible. So I must be doing something wrong when dealing with multiple 
>> threads.
>> I will try out pthreads next, because I have no idea anymore what the 
>> cause could be here.
>>
>> Cheers,
>> Dieter
>> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 23:20:33 UTC+2:
>>
>>> Is there some reason you added `-sABORTING_MALLOC=0`.. that looks a 
>>> little suspicious, since it means the program can continue after malloc 
>>> fails.. which mean that any callsite that doesn't check the return value of 
>>> malloc can lead to segfaults.   If you remove that setting does the 
>>> behaviour change?
>>>
>>>
>>>
>>> On Thu, May 25, 2023 at 1:27 PM 'Dieter Weidenbrück' via 
>>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>>
>>>> Hi Sam,
>>>>
>>>> I can run the code in a single thread without problems, and I have done 
>>>> that for a while. So I assume that the code is stable.
>>>>
>>>> Here is the command line I use  in a .bat file:
>>>> emcc ./src/main.c ^
>>>> ...
>>>> ./src/w_com.c ^
>>>> -I ./include/ ^
>>>> -g3 ^
>>>> --source-map-base ./ ^
>>>> -gsource-map ^
>>>> -s ALLOW_MEMORY_GROWTH=1 ^
>>>> -s ENVIRONMENT=web,worker ^
>>>> --shell-file ./index_template.html ^
>>>> -s SUPPORT_ERRNO=0 ^
>>>> -s MODULARIZE=1 ^
>>>> -s ABORTING_MALLOC=0 ^
>>>> -sWASM_WORKERS ^
>>>> -s "EXPORT_NAME='wasmMod'" ^
>>>> -s EXPORTED_FUNCTIONS="['_malloc','_free','_main']" ^
>>>> -s EXPORTED_RUNTIME_METHODS=
>>>> "['cwrap','UTF16ToString','UTF8ToString','stringToUTF8','allocateUTF8']" 
>>>> ^
>>>> -o index.html
>>>>
>>>> I will start familiarizing myself with pthreads to test whether that 
>>>> would work better.
>>>>
>>>> BTW, as an old C programmer I am fascinated by emscripten and its 
>>>> possibilities. Excellent job!
>>>>
>>>> Cheers,
>>>> Dieter
>>>>
>>>> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 20:29:58 UTC+2:
>>>>
>>>>> This looks like some kind of memory corruption, most likely due to the 
>>>>> use of muiltithreading/wasm_workers    Are you able to build a single 
>>>>> threaded version of your program, or one that uses normal pthreads rather 
>>>>> than wasm workers?
>>>>>
>>>>> Also, can you share the full link command you are using?
>>>>>
>>>>> cheers,
>>>>> sam
>>>>>
>>>>> On Thu, May 25, 2023 at 9:20 AM 'Dieter Weidenbrück' via 
>>>>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>>>>
>>>>>> This is a memory snapshot when using SAFE_HEAP. So here I am quite 
>>>>>> below the browser limits, still the segfault occurs in different places.
>>>>>> Ignore the first console line, it results from Norton Utilities I 
>>>>>> think.
>>>>>>
>>>>>> [image: error2.png]
>>>>>>
>>>>>> Dieter Weidenbrück schrieb am Donnerstag, 25. Mai 2023 um 18:06:27 
>>>>>> UTC+2:
>>>>>>
>>>>>>> Hi Sam,
>>>>>>> I noticed already that I am bumping against browser limits, 
>>>>>>> especially with sanitizer switched on, so I reduced the pre-allocation 
>>>>>>> calls.
>>>>>>> It turns out that asan uses so much memory that I can't use it to 
>>>>>>> analyze this case.
>>>>>>>
>>>>>>> I use 
>>>>>>> -s ALLOW_MEMORY_GROWTH=1
>>>>>>> but don't specify any MAXIMUM_MEMORY.
>>>>>>>
>>>>>>> No pthreads version so far. I might try this next.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dieter
>>>>>>>
>>>>>>> s...@google.com schrieb am Donnerstag, 25. Mai 2023 um 17:55:41 
>>>>>>> UTC+2:
>>>>>>>
>>>>>>>> Firstly, if you are allocating 1.8Gb you are likely pushing up 
>>>>>>>> against browser limits.  Are you specifying a MAXIMUM_MEMORY of larger 
>>>>>>>> than 
>>>>>>>> 2GB?
>>>>>>>>
>>>>>>>> Secondly, it looks like you are using wasm workers, which are still 
>>>>>>>> relatively new.  Do you have a version of your code that uses pthreads 
>>>>>>>> instead?  It might tell is if the issue is related to wasm workers.
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> sam
>>>>>>>>
>>>>>>>> On Thu, May 25, 2023 at 8:06 AM 'Dieter Weidenbrück' via 
>>>>>>>> emscripten-discuss <emscripte...@googlegroups.com> wrote:
>>>>>>>>
>>>>>>>>> The joy was premature, even with pre-allocated heap size segfaults 
>>>>>>>>> occur. :(
>>>>>>>>>
>>>>>>>>> Dieter Weidenbrück schrieb am Donnerstag, 25. Mai 2023 um 16:28:37 
>>>>>>>>> UTC+2:
>>>>>>>>>
>>>>>>>>>> All,
>>>>>>>>>> I am experiencing segmentation faults when using wasm workers.
>>>>>>>>>> Overview:
>>>>>>>>>> I am working on a project with considerable 3D data sets. The 
>>>>>>>>>> code has been stable for a while when running in the main thread 
>>>>>>>>>> alone. 
>>>>>>>>>> Then I started using js workers (no shared memory), and again all 
>>>>>>>>>> was well.
>>>>>>>>>> Now I've switched to SharedArrayBuffers and wasm workers, and I 
>>>>>>>>>> keep running into random problems. 
>>>>>>>>>> I have prepared the code such that I can run with 0 workers up to 
>>>>>>>>>> hardware.concurrency workers. All is well with 0 workers, but as 
>>>>>>>>>> soon as I 
>>>>>>>>>> use one or more workers, I keep getting segfaults because of invalid 
>>>>>>>>>> pointers, access out of bounds and similar.
>>>>>>>>>>
>>>>>>>>>> What happens in main thread and what in the wasm workers:
>>>>>>>>>> I allocate all objects in the main thread when importing the 3D 
>>>>>>>>>> file. Then i fire off a function for each object that will do some 
>>>>>>>>>> serious 
>>>>>>>>>> calculations of the data, including allocating and disposing of 
>>>>>>>>>> memory. The 
>>>>>>>>>> workers allocate approx. 300 to 400 MB in addition to the main 
>>>>>>>>>> thread. All 
>>>>>>>>>> this happens in the same sharedArrayBuffer, of course.
>>>>>>>>>>
>>>>>>>>>> Here is what I've tried so far:
>>>>>>>>>> - compiling with SAFE_HEAP=1
>>>>>>>>>> not a lot of  helpful information,
>>>>>>>>>> - compiling with -fsanitize=address 
>>>>>>>>>> everything works without problems here!
>>>>>>>>>> - compiling with ASSERTIONS=2
>>>>>>>>>> gave me this information:
>>>>>>>>>> [image: error.png]
>>>>>>>>>>
>>>>>>>>>> To me it looks like another resize call is executed while other 
>>>>>>>>>> workers keep working on the buffer, and then something gets into 
>>>>>>>>>> conflict.
>>>>>>>>>> To test this, I allocated 1.8 GB right after startup in the main 
>>>>>>>>>> thread and disposed the mem blocks again just to trigger heap 
>>>>>>>>>> resize. After 
>>>>>>>>>> that everything works like a charm.
>>>>>>>>>>
>>>>>>>>>> Is there anything I am doing wrong?
>>>>>>>>>> Sorry for not providing a sample, but there is a lot of code 
>>>>>>>>>> involved, and it is not easy to simulate this behavior. Happy to 
>>>>>>>>>> answer 
>>>>>>>>>> questions.
>>>>>>>>>>
>>>>>>>>>> All comments are appreciated.
>>>>>>>>>> Thanks,
>>>>>>>>>> Dieter
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "emscripten-discuss" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to emscripten-disc...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/emscripten-discuss/80d56314-59d8-4332-bb2e-ebe00fe52ea3n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/emscripten-discuss/80d56314-59d8-4332-bb2e-ebe00fe52ea3n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "emscripten-discuss" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to emscripten-disc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/emscripten-discuss/cfc03512-f69f-44b0-8c14-1f1a8e4ffe9fn%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/emscripten-discuss/cfc03512-f69f-44b0-8c14-1f1a8e4ffe9fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "emscripten-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to emscripten-disc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/emscripten-discuss/e568e189-4259-460f-9601-e7996927cdb7n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/emscripten-discuss/e568e189-4259-460f-9601-e7996927cdb7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "emscripten-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to emscripten-disc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/emscripten-discuss/b20d2de8-2532-4441-b8fc-3ef8f049f7f0n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/emscripten-discuss/b20d2de8-2532-4441-b8fc-3ef8f049f7f0n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/emscripten-discuss/45759244-a436-4c3d-941f-6a040a70adb4n%40googlegroups.com.

Re: Segmentation faults in wasm workers

Reply via email to