On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <r...@durin42.com> wrote:
> (+indygreg) > > > On Nov 11, 2019, at 03:04, Pierre-Yves David < > pierre-yves.da...@ens-lyon.org> wrote: > > > > Hi everyone, > > > > I am looking into introducing parallelism into `hg debugupgraderepo`. I > already have a very useful prototype that precompute in // copies > information when converting to side-data storage. That prototype use > multiprocessing because it is part of the stdlib and work quite well for > this usecase. > > > > However, I know we refrained to use multiprocessing in the past. I know > the import and boostrap cost was to heavy for things like `hg update`. > However, I am not sure if there are other reason to rule out the > multiprocessing module in the `hg debugupgraderepo` case. > > I have basically only ever heard bad things about multiprocessing, > especially on Windows which is the platform where you'd expect it to be the > most useful (since there's no fork()). I think Greg has more details in his > head. > > That said, I guess feel free to experiment, in the knowledge that it > probably isn't significantly better than our extant worker system? > multiprocessing is a pit of despair on Python 2.7. It is a bit better on Python 3. But I still don't trust it. I think you are better off using `concurrent.futures.ProcessPoolExecutor`. But I'm not even sure I trust ProcessPoolExecutor on Windows, especially when `sys.executable` is `hg.exe` instead of `python.exe`: I think both multiprocessing and concurrent.futures make assumptions about how to invoke the "run a worker" code on a new process that is invalidated when the main process isn't `python.exe`. So I think we may have to roll our own "start a worker" code. The solution that's been bouncing around in my head is to add a `hg debugworker` command (or similar) that dispatches work read from a pipe/file descriptor/temp file to a named <module>.<function> callable. When then implement a custom executor conforming to the interface that concurrent.futures wants and we use that for work dispatch. One of the hardest parts here is implementing a fair work scheduler. There are all kinds of gnarly problems involving buffering, permissions, cross platform differences, etc. Even Rust doesn't have a good cross-platform library for this type of message passing last time I asked (a few months ago I asked and was advised to use something like 0mq, which made me sad). Maybe there is a reasonable Python library we can vendor. But I suspect we'll find limitations in any implementation, as this is a subtly hard problem. > > > > > Cheers, > > > > -- > > Pierre-Yves David > > _______________________________________________ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > >
_______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel