Hi there, I finally managed to find an example where the parallel processing made a measurable improvement (that is, previously, I did not find verbs where the overhead was less than the computational improvement). Doing so, I also wrote some cover verbs that some might find useful. Note that these leave chunking up the input to the user, opposed to the verbs Elijah Stone recently posted.
startpool =: (0 T. ''"_)^:] NB. starts a thread pool of y threads peach =: (t.'')(&>) NB. parallel version of each; still looking for an apricot verb. pevery =: (t.'')(&>)(>@:) NB. parallel version of every (note, @:, not @ for speed improvement) For instance: startpool 8 NB. start 8 threads (note, I have a core i5 with 2 physical cores, 4 virtual ones). 8 in =: <"2]30 300 300 ?@$100 NB. 30 boxed matrices NB. calculating determinants 10 timespacex 'outseq =: -/ .* every in' NB. normal every 1.22102 3.14651e7 10 timespacex 'outpar =: -/ .* pevery in' NB. parallel version 0.578477 3.1583e6 outseq -: outpar 1 10 timespacex 'outseq =: -/ .* each in' NB. normal each 1.21831 3.14669e7 10 timespacex 'outpar =: -/ .* peach in' NB. cheating, due to asynchronous threads. 0.555217 4.20666e6 outseq -: outpar 1 NB. inverting 30 matrices 10 timespacex 'outseq =: %. every in' 0.526015 3.90624e7 10 timespacex 'outpar =: %. pevery in' 0.30344 4.30001e7 outseq -: outpar 1 A few things I noted: 0. Why does "-/ .* pevery" consume 10 times less memory than "-/ .* every"? It is not the case with %. . 1. Note that in the definition of pevery, using @: instead of @ is necessary for better performance. Otherwise, it appears that the created pyxes are immediately opened: For instance "%. pevery" generates as verb >@:(%. t.''&>), in which the part in parenthesis has verb rank 0 0 0. Using >@ would execute > on each pyx individually, which apparently in the current is immediately after its creation. This causes the master to block, and effectively execute all calls sequentially (if I understood correctly). I don't know whether this is considered an implementation detail not specified in the specification of the parsing, but it's important to know in this case. 2. What happens with busy threads whose pyxes are no longer accessible? Should they be terminated, i.e. in the spirit of garbage collection of unreachable values? Perhaps they could, only if they can be assured not to have side effects (e.g. do not have explicit components in the verb that's being executed that could cause side effects, or perhaps some foreigns, e.g. file operations, or perhaps there could be an option to t. letting the user "promise" the supplied verb does not have side-effects). 3. Would it make sense to have threads that can return values multiple times? I'd think of generators and iterators that could be more easily written, or perhaps used as in Lua's coroutines ( https://www.lua.org/manual/5.3/manual.html#2.6) which, albeit not being parallel, allow the user to resume a coroutine, and the function to yield values, continuing after the resume where it left off. Now this is possible only using locales containing some state, jumping around the next function using goto., which feels more clumsy than it should (for an example, see the classes "permutations" and "altflipsgen" in my fannkuch implementation here: http://www.jsoftware.com/pipermail/beta/2021-September/010048.html). I think this could be integrated with the pyx concept, where successive openings of a pyx could be equivalent to successive "resume" (or "next") calls. This would however make the content of a pyx a read-only-once value, that is changed by the parallely executing thread after it is opened the first time. This would also need a primitive (and/or control word for explicit definitions) for yielding a value to the calling/opening thread. This way, it would be impossible to pass values to the thread when resuming though. Another way of implementing this without modifying the pyxes' behaviour and allowing passing new values to the thread, would be to have a separate "resume" verb that can be called on a thread number. Perhaps this does not even have to be integrated with the true concurrency, so that it could as well be used as a conceptual abstraction (as in Lua, see the link above) on systems not supporting true multi-threading (as appears to be the case for 32 bit J at the moment). Aside from allowing implementation of generators and whatnots, it would also allow making specific threads dedicated to executing a specific verb repetitively, on different data, which might (or not, I'm far from expert) be more efficient as it could keep in memory the routine it is executing. 4. Did anyone try to combine this new threading model with reading/writing globals (or even more fun, memory mapped files)? With the C FFI, or the wd subsystem? Just my 5 cents. Jan-Pieter ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
