branch: externals/el-job commit 934327f011e139d9723bf4fcb8ba1550c5b7392d Author: Martin Edström <meedst...@runbox.eu> Commit: Martin Edström <meedst...@runbox.eu>
Readme --- README.org | 58 +++++++++++-- el-job.texi | 267 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 317 insertions(+), 8 deletions(-) diff --git a/README.org b/README.org index 8d5d17d6a7..19b9950ee3 100644 --- a/README.org +++ b/README.org @@ -1,9 +1,11 @@ -# Copying and distribution of this file, with or without modification, -# are permitted in any medium without royalty provided the copyright -# notice and this notice are preserved. This file is offered as-is, -# without any warranty. -#+HTML: <a href="https://repology.org/project/emacs%3Ael-job/versions"><img src="https://repology.org/badge/vertical-allrepos/emacs%3Ael-job.svg" alt="Packaging status"></a> -* el-job +#+TITLE: el-job +#+AUTHOR: Martin Edström +#+EMAIL: meedst...@runbox.eu +#+EXPORT_FILE_NAME: el-job +#+TEXINFO_DIR_TITLE: El-Job: (el-job). +#+TEXINFO_DIR_DESC: Async multicore mapcar +#+TEXINFO_DIR_CATEGORY: Emacs +#+HTML: <a href="https://repology.org/project/emacs%3Ael-job/versions"><img src="https://repology.org/badge/vertical-allrepos/emacs%3Ael-job.svg" alt="Packaging status" align="right"></a> Imagine you have a function you'd like to run on a long list of inputs. You could run =(mapcar #'FN INPUTS)=, but that hangs Emacs until done. @@ -13,10 +15,50 @@ In the meantime, current Emacs does not hang at all. Best of all, it completes /faster/ than =(mapcar #'FN INPUTS)=, owing to the use of all CPU cores! -For real-world usage, search for =el-job= in the source of [[https://github.com/meedstrom/org-mem/blob/main/org-mem.el][org-mem.el]]. +That's it in a nutshell. You can look at real-world usage by searching for "el-job" in these packages: +- [[https://raw.githubusercontent.com/meedstrom/org-mem/refs/heads/main/org-mem.el][org-mem.el]] +- [[https://raw.githubusercontent.com/meedstrom/org-roam-async/refs/heads/main/org-roam-async.el][org-roam-async.el]] +* New (el-job-ng) + +Since 2.5.0, released [2025-10-07 Tue], this repo comes with a variant library "el-job-ng". + +I find it simpler and easier to reason about. 400 lines of code instead of 800. + +Some differences: + +- Does /not/ ever keep a process alive +- Does /not/ merge the subprocesses' outputs in a bespoke way, just uses =append=. +- Does /not/ look up =load-history= to try hard to find an .eln variant of your libraries, instead just shares the whole =load-path= to let =require= do its thing. +- Removed argument =:if-busy=, you can manage this yourself with =el-job-ng-busy-p= and =el-job-ng-kill=. +- Argument =:funcall-per-input= takes a function of two arguments, not one. +- Added an optional argument =:eval=. + +Feel free to file an issue or email. The design is always changing in response to my needs, so hearing about other people's needs is instructive as well! + +** Future work + +I may write more variants. Something that came with experience is that it's best to hard-code an entire variant library for a narrow use-case rather than complicate the same library with different code flows, as when it comes to this type of library, you really want to keep it easy to reason about! + +Ideas as of [2025-10-06 Mon]: + +- File IPC :: + Sending input and output by writing and reading files, instead of through the pipe connection. + + Theory: performance can sometimes be a lot better, with large inputs and outputs. I would guess it depends a lot on the machine. + + Drawback: if the data sent is sensitive, those files probably should be encrypted, and that could negate any performance benefit. + +- Worker daemons :: + Keeping subprocesses alive forever, so they are available at beck and call -- think worker daemons. + + That's basically implemented in "el-job-old", and partly why it got so hairy, but it could be remade to put this usage front-and-center, with a "happy path" UX. + + At this time, I do not have a demanding use-case to experiment with, to discover what that happy path should be, how the affordances should be. + +* Old (el-job-old) ** Design rationale -I want to shorten the round-trip as much as possible, *between the start of an async task and having the results*. +I wanted to shorten the round-trip as much as possible, *between the start of an async task and having the results*. For example, say you have some lisp that collects completion candidates, and you want to run it asynchronously because the lisp you wrote isn't always fast enough to avoid the user's notice, but you'd still like it to return as soon as possible. diff --git a/el-job.texi b/el-job.texi new file mode 100644 index 0000000000..113d205a8b --- /dev/null +++ b/el-job.texi @@ -0,0 +1,267 @@ +\input texinfo @c -*- texinfo -*- +@c %**start of header +@setfilename el-job.info +@settitle el-job +@documentencoding UTF-8 +@documentlanguage en +@c %**end of header + +@dircategory Emacs +@direntry +* El-Job: (el-job). Async multicore mapcar. +@end direntry + +@finalout +@titlepage +@title el-job +@author Martin Edström +@end titlepage + +@ifnottex +@node Top +@top el-job + +Imagine you have a function you'd like to run on a long list of inputs. You could run @samp{(mapcar #'FN INPUTS)}@comma{} but that hangs Emacs until done. + +This library lets you run the same function in many subprocesses (one per CPU core)@comma{} each with their own split of the @samp{INPUTS} list@comma{} then merge their outputs and pass it back to the current Emacs. + +In the meantime@comma{} current Emacs does not hang at all. + +Best of all@comma{} it completes @emph{faster} than @samp{(mapcar #'FN INPUTS)}@comma{} owing to the use of all CPU cores! + +That's it in a nutshell. You can look at real-world usage by searching for "el-job" in these packages: +@itemize +@item +@uref{https://raw.githubusercontent.com/meedstrom/org-mem/refs/heads/main/org-mem.el, org-mem.el} +@item +@uref{https://raw.githubusercontent.com/meedstrom/org-roam-async/refs/heads/main/org-roam-async.el, org-roam-async.el} +@end itemize + +@end ifnottex + +@menu +* New (el-job-ng):: +* Old (el-job-old):: + +@detailmenu +--- The Detailed Node Listing --- + +New (el-job-ng) + +* Future work:: + +Old (el-job-old) + +* Design rationale:: +* News 2.4: News 24. +* News 2.3: News 23. +* News 2.1: News 21. +* News 2.0: News 20. +* News 1.1: News 11. +* News 1.0: News 10. +* Limitations:: + +Design rationale + +* Processes stay alive:: +* Emacs 30 @samp{fast-read-process-output}:: + +@end detailmenu +@end menu + +@node New (el-job-ng) +@chapter New (el-job-ng) + +Since 2.5.0@comma{} released @emph{<2025-Oct-07>}@comma{} this repo comes with a variant library "el-job-ng". + +I find it simpler and easier to reason about. 400 lines of code instead of 800. + +Some differences: + +@itemize +@item +Does @emph{not} ever keep a process alive +@item +Does @emph{not} merge the subprocesses' outputs in a bespoke way@comma{} just uses @samp{append}. +@item +Does @emph{not} look up @samp{load-history} to try hard to find an .eln variant of your libraries@comma{} instead just shares the whole @samp{load-path} to let @samp{require} do its thing. +@item +Removed argument @samp{:if-busy}@comma{} you can manage this yourself with @samp{el-job-ng-busy-p} and @samp{el-job-ng-kill}. +@item +Argument @samp{:funcall-per-input} takes a function of two arguments@comma{} not one. +@item +Added an optional argument @samp{:eval}. +@end itemize + +Feel free to file an issue or email. The design is always changing in response to my needs@comma{} so hearing about other people's needs is instructive as well! + +@menu +* Future work:: +@end menu + +@node Future work +@section Future work + +I may write more variants. Something that came with experience is that it's best to hard-code an entire variant library for a narrow use-case rather than complicate the same library with different code flows@comma{} as when it comes to this type of library@comma{} you really want to keep it easy to reason about! + +Ideas as of @emph{<2025-Oct-06>}: + +@table @asis +@item File IPC +Sending input and output by writing and reading files@comma{} instead of through the pipe connection. + +Theory: performance can sometimes be a lot better@comma{} with large inputs and outputs. I would guess it depends a lot on the machine. + +Drawback: if the data sent is sensitive@comma{} those files probably should be encrypted@comma{} and that could negate any performance benefit. + +@item Worker daemons +Keeping subprocesses alive forever@comma{} so they are available at beck and call -- think worker daemons. + +That's basically implemented in "el-job-old"@comma{} and partly why it got so hairy@comma{} but it could be remade to put this usage front-and-center@comma{} with a "happy path" UX@. + +At this time@comma{} I do not have a demanding use-case to experiment with@comma{} to discover what that happy path should be@comma{} how the affordances should be. +@end table + +@node Old (el-job-old) +@chapter Old (el-job-old) + +@menu +* Design rationale:: +* News 2.4: News 24. +* News 2.3: News 23. +* News 2.1: News 21. +* News 2.0: News 20. +* News 1.1: News 11. +* News 1.0: News 10. +* Limitations:: +@end menu + +@node Design rationale +@section Design rationale + +I wanted to shorten the round-trip as much as possible@comma{} @strong{between the start of an async task and having the results}. + +For example@comma{} say you have some lisp that collects completion candidates@comma{} and you want to run it asynchronously because the lisp you wrote isn't always fast enough to avoid the user's notice@comma{} but you'd still like it to return as soon as possible. + +@menu +* Processes stay alive:: +* Emacs 30 @samp{fast-read-process-output}:: +@end menu + +@node Processes stay alive +@subsection Processes stay alive + +In the above example@comma{} a user might only delay a fraction of a second between opening the minibuffer and beginning to type@comma{} so there's scant room for overhead like spinning up subprocesses that load a bunch of libraries before getting to work. + +Thus@comma{} el-job keeps idle subprocesses for up to 30 seconds after a job finishes@comma{} awaiting more input. + +An aesthetic drawback is cluttering your task manager with many processes named "emacs". + +Users who tend to run system commands such as @samp{pkill emacs} may find that the command occasionally "does not work"@comma{} because it actually killed an el-job subprocess@comma{} instead of the Emacs they see on screen. + +@node Emacs 30 @samp{fast-read-process-output} +@subsection Emacs 30 @samp{fast-read-process-output} + +Some other libraries@comma{} like the popular @uref{https://github.com/jwiegley/emacs-async/, async.el}@comma{} are designed around a custom process filter. + +Since Emacs 30@comma{} it's a good idea to instead use the @emph{built-in} process filter when performance is critical@comma{} and el-job does so. Quoting @uref{https://github.com/emacs-mirror/emacs/blob/master/etc/NEWS.30, NEWS.30}: + +@example +** The default process filter was rewritten in native code. +The round-trip through the Lisp function +'internal-default-process-filter' is skipped when the process filter is +the default one. It is reimplemented in native code@comma{} reducing GC churn. +To undo this change@comma{} set 'fast-read-process-output' to nil. +@end example + +@node News 24 +@section News 2.4 + +@itemize +@item +Jobs must now have @samp{:inputs}. If @samp{:inputs} nil and there was nothing queued@comma{} @samp{el-job-launch} will no-op and return the symbol @samp{inputs-were-empty}. +@end itemize + +@node News 23 +@section News 2.3 + +@itemize +@item +Some renames to follow Elisp convention +@itemize +@item +@samp{el-job:timestamps} and friends now @samp{el-job-timestamps}. +@end itemize +@end itemize + +@node News 21 +@section News 2.1 + +@itemize +@item +DROP SUPPORT Emacs 28 +@itemize +@item +It likely has not been working for a while anyway. Maybe works on the @uref{https://github.com/meedstrom/el-job/tree/v0.3, v0.3 branch}@comma{} from 0.3.26+. +@end itemize +@end itemize + +@node News 20 +@section News 2.0 + +@itemize +@item +Jobs must now have @samp{:id} (no more anonymous jobs). +@item +Pruned many code paths. +@end itemize + +@node News 11 +@section News 1.1 + +@itemize +@item +Changed internals so that all builds of Emacs can be expected to perform similarly well. +@end itemize + +@node News 10 +@section News 1.0 + +@itemize +@item +No longer keeps processes alive forever. All jobs are kept alive for up to 30 seconds of disuse@comma{} then reaped. +@item +Pruned many code paths. +@item +Many arguments changed@comma{} and a few were removed. Consult the docstring of @samp{el-job-launch} again. +@end itemize + +@node Limitations +@section Limitations + +@enumerate +@item +The return value from the @samp{:funcall-per-input} function must always be a list with a fixed length@comma{} where the elements are also lists. + +For example@comma{} org-mem passes @samp{:funcall-per-input #'org-mem-parser--parse-file} to el-job@comma{} and if you look in @uref{https://github.com/meedstrom/org-mem/blob/main/org-mem-parser.el, org-mem-parser.el} for the defun of @samp{org-mem-parser--parse-file}@comma{} it always returns a list of 5 items: + +@lisp +(list (if missing-file (list missing-file)) ; List of 0 or 1 item + (if file-mtime (list file-mtime)) ; List of 0 or 1 item + found-entries ; List of many items + org-node-parser--found-links ; List of many items + (if problem (list problem)))) ; List of 0 or 1 item +@end lisp + +It may look clunky to return sub-lists of only one item@comma{} but you could consider it a minor expense in exchange for simpler library code. + +@item +Some data types cannot be exchanged with the children: those whose printed form look like @samp{#<...>}. For example@comma{} @samp{#<buffer notes.org>}@comma{} @samp{#<obarray n=94311>}@comma{} @samp{#<marker at 3102 in README.org>}. + +IIUC@comma{} this sort of data only has meaning within the current process -- so even if you could send it@comma{} it would not be usable by the recipient anyway. + +@item +For now@comma{} this library tends to be applicable only to a narrow set of use-cases@comma{} since you can only pass one @samp{:inputs} list which would tend to contain a single kind of thing@comma{} e.g. it could be a list of files to visit@comma{} to be split between child processes. In many potential use-cases@comma{} you'd actually want multiple input lists and split them differently@comma{} and that's not supported yet. +@end enumerate + +@bye