[Jbeta] Thoughts on threading

Jan-Pieter Jacobs Wed, 27 Apr 2022 07:29:19 -0700

Hi there,

I finally managed to find an example where the parallel processing made a
measurable improvement (that is, previously, I did not find verbs where the
overhead was less than the computational improvement).
Doing so, I also wrote some cover verbs that some might find useful. Note
that these leave chunking up the input to the user, opposed to the verbs
Elijah Stone recently posted.


startpool =: (0 T. ''"_)^:] NB. starts a thread pool of y threads
peach =: (t.'')(&>)     NB. parallel version of each; still looking for an
apricot verb.
pevery =: (t.'')(&>)(>@:)  NB. parallel version of every (note, @:, not @
for speed improvement)

For instance:
   startpool 8 NB. start 8 threads (note, I have a core i5 with 2 physical
cores, 4 virtual ones).
8
   in =: <"2]30 300 300 ?@$100 NB. 30 boxed matrices
NB. calculating determinants
   10 timespacex 'outseq =: -/ .* every in'  NB. normal every
1.22102 3.14651e7
   10 timespacex 'outpar =: -/ .* pevery in' NB. parallel version
0.578477 3.1583e6
   outseq -: outpar
1
   10 timespacex 'outseq =: -/ .* each in'  NB. normal each
1.21831 3.14669e7
   10 timespacex 'outpar =: -/ .* peach in' NB. cheating, due to
asynchronous threads.
0.555217 4.20666e6
   outseq -: outpar
1
NB. inverting 30 matrices
   10 timespacex 'outseq =: %. every in'
0.526015 3.90624e7
   10 timespacex 'outpar =: %. pevery in'
0.30344 4.30001e7
   outseq -: outpar
1

A few things I noted:
0. Why does "-/ .* pevery" consume 10 times less memory than "-/ .* every"?
It is not the case with %. .

1. Note that in the definition of pevery, using @: instead of @ is
necessary for better performance. Otherwise, it appears that the created
pyxes are immediately opened: For instance "%. pevery" generates as verb
>@:(%. t.''&>), in which the part in parenthesis has verb rank 0 0 0. Using
>@ would execute > on each pyx individually, which apparently in the
current is immediately after its creation. This causes the master to block,
and effectively execute all calls sequentially (if I understood correctly).
I don't know whether this is considered an implementation detail not
specified in the specification of the parsing, but it's important to know
in this case.

2. What happens with busy threads whose pyxes are no longer accessible?
Should they be terminated, i.e. in the spirit of garbage collection of
unreachable values? Perhaps they could, only if they can be assured not to
have side effects (e.g. do not have explicit components in the verb that's
being executed that could cause side effects, or perhaps some foreigns,
e.g. file operations, or perhaps there could be an option to t. letting the
user "promise" the supplied verb does not have side-effects).

3. Would it make sense to have threads that can return values multiple
times? I'd think of generators and iterators that could be more easily
written, or perhaps used as in Lua's coroutines (
https://www.lua.org/manual/5.3/manual.html#2.6) which, albeit not being
parallel, allow the user to resume a coroutine, and the function to yield
values, continuing after the resume where it left off. Now this is possible
only using locales containing some state, jumping around the next function
using goto., which feels more clumsy than it should (for an example, see
the classes "permutations" and "altflipsgen" in my fannkuch implementation
here: http://www.jsoftware.com/pipermail/beta/2021-September/010048.html).
I think this could be integrated with the pyx concept, where successive
openings of a pyx could be equivalent to successive "resume" (or "next")
calls. This would however make the content of a pyx a read-only-once value,
that is changed by the parallely executing thread after it is opened the
first time. This would also need a primitive (and/or control word for
explicit definitions) for yielding a value to the calling/opening thread.
This way, it would be impossible to pass values to the thread when resuming
though.
Another way of implementing this without modifying the pyxes' behaviour and
allowing passing new values to the thread, would be to have a separate
"resume" verb that can be called on a thread number. Perhaps this does not
even have to be integrated with the true concurrency, so that it could as
well be used as a conceptual abstraction (as in Lua, see the link above) on
systems not supporting true multi-threading (as appears to be the case for
32 bit J at the moment).
Aside from allowing implementation of generators and whatnots, it would
also allow making specific threads dedicated to executing a specific verb
repetitively, on different data, which might (or not, I'm far from expert)
be more efficient as it could keep in memory the routine it is executing.

4. Did anyone try to combine this new threading model with reading/writing
globals (or even more fun, memory mapped files)? With the C FFI, or the wd
subsystem?

Just my 5 cents.

Jan-Pieter
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jbeta] Thoughts on threading

Reply via email to