On 11/12/19 4:35 AM, Gregory Szorc wrote:
On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <r...@durin42.com
<mailto:r...@durin42.com>> wrote:
(+indygreg)
> On Nov 11, 2019, at 03:04, Pierre-Yves David
<pierre-yves.da...@ens-lyon.org
<mailto:pierre-yves.da...@ens-lyon.org>> wrote:
>
> Hi everyone,
>
> I am looking into introducing parallelism into `hg
debugupgraderepo`. I already have a very useful prototype that
precompute in // copies information when converting to side-data
storage. That prototype use multiprocessing because it is part of
the stdlib and work quite well for this usecase.
>
> However, I know we refrained to use multiprocessing in the past.
I know the import and boostrap cost was to heavy for things like `hg
update`. However, I am not sure if there are other reason to rule
out the multiprocessing module in the `hg debugupgraderepo` case.
I have basically only ever heard bad things about multiprocessing,
especially on Windows which is the platform where you'd expect it to
be the most useful (since there's no fork()). I think Greg has more
details in his head.
That said, I guess feel free to experiment, in the knowledge that it
probably isn't significantly better than our extant worker system?
multiprocessing is a pit of despair on Python 2.7. It is a bit better on
Python 3. But I still don't trust it. I think you are better off using
`concurrent.futures.ProcessPoolExecutor`.
That looks great, but this is not available in python-2.7
But I'm not even sure I trust ProcessPoolExecutor on Windows, especially
when `sys.executable` is `hg.exe` instead of `python.exe`: I think both
multiprocessing and concurrent.futures make assumptions about how to
invoke the "run a worker" code on a new process that is invalidated when
the main process isn't `python.exe`.
That's unfortunate :-/ Any way to reliably test this and get it fixed
upstream ?
So I think we may have to roll our own "start a worker" code. The
solution that's been bouncing around in my head is to add a `hg
debugworker` command (or similar) that dispatches work read from a
pipe/file descriptor/temp file to a named <module>.<function> callable.
When then implement a custom executor conforming to the interface that
concurrent.futures wants and we use that for work dispatch. One of the
hardest parts here is implementing a fair work scheduler. There are all
kinds of gnarly problems involving buffering, permissions, cross
platform differences, etc. Even Rust doesn't have a good cross-platform
library for this type of message passing last time I asked (a few months
ago I asked and was advised to use something like 0mq, which made me
sad). Maybe there is a reasonable Python library we can vendor. But I
suspect we'll find limitations in any implementation, as this is a
subtly hard problem.
Yeah, the problem is hard enough that I would rather have external
library dealing with it.
--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel