On Fri, Nov 29, 2019, 06:45 Pierre-Yves David < pierre-yves.da...@ens-lyon.org> wrote:
> > > On 11/12/19 4:35 AM, Gregory Szorc wrote: > > On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <r...@durin42.com > > <mailto:r...@durin42.com>> wrote: > > > > (+indygreg) > > > > > On Nov 11, 2019, at 03:04, Pierre-Yves David > > <pierre-yves.da...@ens-lyon.org > > <mailto:pierre-yves.da...@ens-lyon.org>> wrote: > > > > > > Hi everyone, > > > > > > I am looking into introducing parallelism into `hg > > debugupgraderepo`. I already have a very useful prototype that > > precompute in // copies information when converting to side-data > > storage. That prototype use multiprocessing because it is part of > > the stdlib and work quite well for this usecase. > > > > > > However, I know we refrained to use multiprocessing in the past. > > I know the import and boostrap cost was to heavy for things like `hg > > update`. However, I am not sure if there are other reason to rule > > out the multiprocessing module in the `hg debugupgraderepo` case. > > > > I have basically only ever heard bad things about multiprocessing, > > especially on Windows which is the platform where you'd expect it to > > be the most useful (since there's no fork()). I think Greg has more > > details in his head. > > > > That said, I guess feel free to experiment, in the knowledge that it > > probably isn't significantly better than our extant worker system? > > > > > > multiprocessing is a pit of despair on Python 2.7. It is a bit better on > > Python 3. But I still don't trust it. I think you are better off using > > `concurrent.futures.ProcessPoolExecutor`. > > That looks great, but this is not available in python-2.7 > There's a backport of the 3.x concurrent futures available on pypi, and AIUI it fixes some important bugs in the package that didn't ever land in 2.x. > > But I'm not even sure I trust ProcessPoolExecutor on Windows, especially > > when `sys.executable` is `hg.exe` instead of `python.exe`: I think both > > multiprocessing and concurrent.futures make assumptions about how to > > invoke the "run a worker" code on a new process that is invalidated when > > the main process isn't `python.exe`. > > That's unfortunate :-/ Any way to reliably test this and get it fixed > upstream ? > > > So I think we may have to roll our own "start a worker" code. The > > solution that's been bouncing around in my head is to add a `hg > > debugworker` command (or similar) that dispatches work read from a > > pipe/file descriptor/temp file to a named <module>.<function> callable. > > When then implement a custom executor conforming to the interface that > > concurrent.futures wants and we use that for work dispatch. One of the > > hardest parts here is implementing a fair work scheduler. There are all > > kinds of gnarly problems involving buffering, permissions, cross > > platform differences, etc. Even Rust doesn't have a good cross-platform > > library for this type of message passing last time I asked (a few months > > ago I asked and was advised to use something like 0mq, which made me > > sad). Maybe there is a reasonable Python library we can vendor. But I > > suspect we'll find limitations in any implementation, as this is a > > subtly hard problem. > > Yeah, the problem is hard enough that I would rather have external > library dealing with it. > > -- > Pierre-Yves David >
_______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel