Yes thanks JJ, that should be sufficient for now. Good job! On Tuesday, November 17, 2015 at 9:54:46 AM UTC-5, jj wrote: > > The filesystem is already currently thread-safe, i.e. meaning that > multiple threads can safely access files in the filesystem. > > When I say "implement a fully multithreaded filesystem", I mean the effect > of removing the need to proxy the operations via the main thread. The > reason why this proxying is done is that the whole filesystem data > structure (MEMFS) is thread-local to the main thread, and it can't be > shared between threads. To implement a filesystem where multiple threads > can access it without proxying requires moving the filesystem into asm.js > heap, and implementing it in C/C++ code (shareable via SharedArrayBuffer) > instead of javascript code (unshareable, since JS objects are always thread > local). > > Also, in this case, it was not the postMessage() that is causing slowdown > - the proposed test case does not utilize postMessage to pass the proxying > task, but it uses the SharedArrayBuffer data structure. The postMessage() > is used in the case when the main thread is idle, to wake it up to perform > the proxied call. In this case, the main thread is not idle, but it was > sleeping in pthread_join(), where the sleep was performed in 100msec > slices. I posted the pull request > https://github.com/kripken/emscripten/pull/3923 to dramatically reduce > the slice in the main thread to 1msec, which improved the test case from > 3232 msecs to 45 msecs. I think this would be enough to call this issue as > resolved for the interim, until we have the ability to rewrite the > filesystem in C/C++. > > 2015-11-17 16:29 GMT+02:00 Robert Goulet <[email protected] > <javascript:>>: > >> Most likely it would be better to just go straight to making the >> filesystem thread safe. This is something we would really like to use in >> our game engine. On the other end, I don't know how much else is there that >> would still be proxied but I would bet that getting rid of the postMessage >> completely like Alon is suggesting would most likely improve performance >> for these other cases. >> >> Is there any way I can help with making the filesystem thread safe? >> >> On Tuesday, November 17, 2015 at 5:51:27 AM UTC-5, jj wrote: >>> >>> Thanks for the test case Robert, and sorry for the long delay in >>> responding. Marked down >>> https://github.com/kripken/emscripten/issues/3922 . When I was writing >>> the proxying, I was concerned that it's going to be slow, and I've seen the >>> filesystem slowdown in other projects as well, but never really believed it >>> could be this much! >>> >>> In my mind the end goal has been to remove the proxying in filesystem >>> related paths altogether, and implement a fully multithreaded filesystem, >>> which would remove the need to proxy. Sounds like we need to finally start >>> moving to that direction, unless there's something short term that we could >>> do here. >>> >>> 2015-11-06 0:52 GMT+02:00 Alon Zakai <[email protected]>: >>> >>>> I would definitely expect a postMessage to have some noticeable lag - >>>> but I would expect a few ms, not the large amounts we see here. Still, >>>> perhaps we can fix this by removing the postMessage entirely. In theory >>>> maybe we could wait on a mutex/futex on the main thread, if we are not >>>> doing anything else? We could only do it for the duration of a single >>>> frame, I suppose, but then at least if called during that time, we would >>>> instantaneously respond. >>>> >>>> On Thu, Nov 5, 2015 at 8:47 AM, Robert Goulet <[email protected]> >>>> wrote: >>>> >>>>> Ok I figured out a way to profile both the >>>>> emscripten_sync_run_in_main_thread and >>>>> emscripten_main_thread_process_queued_calls function and I found that >>>>> 99.9% of the time is spent waiting on the futex and the end of >>>>> emscripten_sync_run_in_main_thread. The mutex lock/unlock themselves >>>>> takes on average 0.01ms so its negligible. Processing the call itself >>>>> takes >>>>> on average 0.65ms. >>>>> >>>>> @Floh Looks like all the time is spent on the context switching >>>>> between the thread and the main thread. Perhaps what happens is exactly >>>>> what you suggested - main thread loop isn't called as fast as we think, >>>>> leaving a big gap between the time we postMessage and the time its >>>>> received. Or perhaps its just the message mechanics of WebWorkers that is >>>>> just slow? >>>>> >>>>> >>>>> On Thursday, November 5, 2015 at 10:46:30 AM UTC-5, Robert Goulet >>>>> wrote: >>>>>> >>>>>> Seems like we are dealing with many things here: >>>>>> >>>>>> 1. Queue's mutex lock to add the call to the queue >>>>>> (emscripten_sync_run_in_main_thread) >>>>>> 2. Then we call postMessage (not sure how it works, could there >>>>>> be some blocking here?) to notify we have something in queue >>>>>> 3. Then queue's mutex unlock >>>>>> 4. Then we block wait on a futex for the operation to complete >>>>>> 5. At some point, the main thread wakes up and process the calls >>>>>> in the queue (emscripten_main_thread_process_queued_calls) >>>>>> 6. Queue's mutex lock to read and empty the queue >>>>>> 7. Queue's mutex unlock >>>>>> >>>>>> I see one area that could be improved, >>>>>> emscripten_main_thread_process_queued_calls: Instead of keeping the >>>>>> mutex >>>>>> locked while we process all queue calls, we could just fetch them on a >>>>>> stack variable and unlock the mutex immediately, allowing other threads >>>>>> to >>>>>> add to the queue while they are being processed, but I'm not sure this >>>>>> would improve the situation here since anyway we are waiting on the >>>>>> futex >>>>>> to complete the fread operation. I'll try to profile the actual locking >>>>>> mechanic to see if its the mutex, or the context switching using the >>>>>> message that is slow. >>>>>> >>>>>> On Thursday, November 5, 2015 at 10:08:08 AM UTC-5, Floh wrote: >>>>>>> >>>>>>> Sounds like you need some sort of read-ahead, where a small read >>>>>>> would actually read a larger portion of the file into a >>>>>>> (size-configurable) >>>>>>> buffer on the worker-thread side with a single round-trip to the >>>>>>> main-thread, and subsequent small reads wouldn't require a main-thread >>>>>>> round-trip until the end of the buffer is reached, and the next read >>>>>>> would >>>>>>> fill up the buffer again... >>>>>>> >>>>>>> Disclaimer: I haven't looked at the code, even if every small read >>>>>>> does a complete, blocking thread-roundtrip, this doesn't explain the >>>>>>> high >>>>>>> blocking times you are seeing. I guess there is another problem that a >>>>>>> single read has a too high blocking time (may be it can only do at most >>>>>>> 1 >>>>>>> read per frame, or something similar). >>>>>>> >>>>>>> Cheers, >>>>>>> -Floh. >>>>>>> >>>>>>> Am Donnerstag, 5. November 2015 15:09:04 UTC+1 schrieb Robert Goulet: >>>>>>>> >>>>>>>> Where is the code that deals with this locking mechanics? I'd like >>>>>>>> to take a look at it. >>>>>>>> >>>>>>>> Also, it doesn't matter much for us how long the entire process >>>>>>>> takes. It's more each read time that matters. If we can improve that, >>>>>>>> would >>>>>>>> be great! >>>>>>>> >>>>>>>> On Wednesday, November 4, 2015 at 5:22:32 PM UTC-5, Alon Zakai >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks for the testcase. I see the same results. >>>>>>>>> >>>>>>>>> It looks like reducing the number of reads helps a lot. The >>>>>>>>> overhead is affected mostly by number of reads. Which could make >>>>>>>>> sense if >>>>>>>>> the main thread is busy (since it's the main browser thread, it could >>>>>>>>> be >>>>>>>>> busy doing anything from rendering to doing some work for another >>>>>>>>> tab) and >>>>>>>>> the worker needs to wait on it. Also, we send a message to the main >>>>>>>>> thread, >>>>>>>>> so any general activity on the event queue could lead to the message >>>>>>>>> being >>>>>>>>> received later. >>>>>>>>> >>>>>>>>> It's also possible that the mutex and futex stuff we do for the >>>>>>>>> blocking call has overhead. Jukka, do we have a way to profile that? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 4, 2015 at 2:06 PM, Robert Goulet < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Here, I quickly wrote this small test case >>>>>>>>>> <https://autodesk.box.com/s/h12u5woqkuwyzg4uk7hw6puck8jg8orh> >>>>>>>>>> which reproduce the problem. >>>>>>>>>> >>>>>>>>>> I get the following output from it: >>>>>>>>>> >>>>>>>>>> Preallocating 1 workers for a pthread spawn pool. >>>>>>>>>> Writing test file (1048576 bytes)... >>>>>>>>>> Reading file from main thread... >>>>>>>>>> Completed in 6.730000ms (~0.205156ms per read of 32768 bytes) >>>>>>>>>> Reading file from another thread... >>>>>>>>>> Completed in 3499.585000ms (~109.355312ms per read of 32768 bytes) >>>>>>>>>> Done. >>>>>>>>>> >>>>>>>>>> Please let me know if there's anything we can do to fix this >>>>>>>>>> major difference between the two. >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> On Wednesday, November 4, 2015 at 3:25:58 PM UTC-5, Robert Goulet >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> That is for many small/medium reads. It seems the size of the >>>>>>>>>>> read does not have any impact on performance thought. >>>>>>>>>>> >>>>>>>>>>> Essentially, the thread we create does the following: >>>>>>>>>>> >>>>>>>>>>> while (true) { >>>>>>>>>>> _queue_semaphore.wait(); >>>>>>>>>>> if (_exit_thread) >>>>>>>>>>> break; >>>>>>>>>>> process_request(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> in that case, the process_request() function takes 200ms+ to >>>>>>>>>>> execute (profiled within the process_request() function itself, so >>>>>>>>>>> that it >>>>>>>>>>> does not include locking mechanics overhead). >>>>>>>>>>> >>>>>>>>>>> If we just call process_request() right away upon adding request >>>>>>>>>>> instead of inserting the request in the thread queue (which >>>>>>>>>>> essentially >>>>>>>>>>> just bypass the thread completely), the process_request() function >>>>>>>>>>> takes >>>>>>>>>>> <0.2ms to execute. The only thing this function does is a switch >>>>>>>>>>> case >>>>>>>>>>> between fopen(), fread() and fclose(). I've narrowed it down to >>>>>>>>>>> being these >>>>>>>>>>> filesystem function who are taking a much longer time to return. >>>>>>>>>>> >>>>>>>>>>> The main thread is waiting on the thread queue to complete >>>>>>>>>>> before returning, so I don't see why it would block the thread from >>>>>>>>>>> doing >>>>>>>>>>> its work? >>>>>>>>>>> >>>>>>>>>>> On Wednesday, November 4, 2015 at 12:53:21 PM UTC-5, Alon Zakai >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> The filesystem itself resides in JS, which can only be accessed >>>>>>>>>>>> from the main thread. Workers therefore need to send messages to >>>>>>>>>>>> communicate with it. However, 200ms seems ridiculously high - is >>>>>>>>>>>> that for a >>>>>>>>>>>> single read()? Or many small reads of small amounts? If you can >>>>>>>>>>>> make a >>>>>>>>>>>> small standalone testcase showing the issue, that would be useful >>>>>>>>>>>> for >>>>>>>>>>>> benchmarking. >>>>>>>>>>>> >>>>>>>>>>>> A possibility is that the blocking is the issue, and the main >>>>>>>>>>>> thread is busy with something else. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 4, 2015 at 7:57 AM, Robert Goulet < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> we are using the new pthread support in Emscripten, and one >>>>>>>>>>>>> thing we noticed is how much slower filesystem functions are when >>>>>>>>>>>>> executed >>>>>>>>>>>>> in a thread. We saw this in the documentation: >>>>>>>>>>>>> >>>>>>>>>>>>> *Currently several of the functions in the C runtime, such as >>>>>>>>>>>>>> filesystem functions like fopen(), fread(), printf(), fprintf() >>>>>>>>>>>>>> etc. are >>>>>>>>>>>>>> not multithreaded, but instead their execution is proxied over >>>>>>>>>>>>>> to the main >>>>>>>>>>>>>> application thread.* >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm just trying to understand what we are dealing with. At >>>>>>>>>>>>> this point I am guessing this is the reason why it is much slower >>>>>>>>>>>>> to read a >>>>>>>>>>>>> file using fread in a thread. We are seeing 1000x slowdowns >>>>>>>>>>>>> compared to >>>>>>>>>>>>> running in the main thread directly. For example, running in main >>>>>>>>>>>>> thread, a >>>>>>>>>>>>> read request can complete in 0.2ms, while in a thread is takes >>>>>>>>>>>>> 200ms. Most >>>>>>>>>>>>> likely that's the overhead of waiting on the main thread to >>>>>>>>>>>>> process proxied >>>>>>>>>>>>> requests? >>>>>>>>>>>>> >>>>>>>>>>>>> Is there any technical blockers preventing filesystem >>>>>>>>>>>>> functions to be multithreaded so that they are no longer put in >>>>>>>>>>>>> the main >>>>>>>>>>>>> thread proxy queue? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "emscripten-discuss" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to >>>>>>>>>>>>> [email protected]. >>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "emscripten-discuss" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "emscripten-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "emscripten-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "emscripten-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > >
-- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
