> On Nov 29, 2019, at 11:46, Augie Fackler <r...@durin42.com> wrote: > > > > >> On Fri, Nov 29, 2019, 06:45 Pierre-Yves David >> <pierre-yves.da...@ens-lyon.org> wrote: >> >> >> On 11/12/19 4:35 AM, Gregory Szorc wrote: >> > On Mon, Nov 11, 2019 at 6:32 AM Augie Fackler <r...@durin42.com >> > <mailto:r...@durin42.com>> wrote: >> > >> > (+indygreg) >> > >> > > On Nov 11, 2019, at 03:04, Pierre-Yves David >> > <pierre-yves.da...@ens-lyon.org >> > <mailto:pierre-yves.da...@ens-lyon.org>> wrote: >> > > >> > > Hi everyone, >> > > >> > > I am looking into introducing parallelism into `hg >> > debugupgraderepo`. I already have a very useful prototype that >> > precompute in // copies information when converting to side-data >> > storage. That prototype use multiprocessing because it is part of >> > the stdlib and work quite well for this usecase. >> > > >> > > However, I know we refrained to use multiprocessing in the past. >> > I know the import and boostrap cost was to heavy for things like `hg >> > update`. However, I am not sure if there are other reason to rule >> > out the multiprocessing module in the `hg debugupgraderepo` case. >> > >> > I have basically only ever heard bad things about multiprocessing, >> > especially on Windows which is the platform where you'd expect it to >> > be the most useful (since there's no fork()). I think Greg has more >> > details in his head. >> > >> > That said, I guess feel free to experiment, in the knowledge that it >> > probably isn't significantly better than our extant worker system? >> > >> > >> > multiprocessing is a pit of despair on Python 2.7. It is a bit better on >> > Python 3. But I still don't trust it. I think you are better off using >> > `concurrent.futures.ProcessPoolExecutor`. >> >> That looks great, but this is not available in python-2.7 > > > There's a backport of the 3.x concurrent futures available on pypi, and AIUI > it fixes some important bugs in the package that didn't ever land in 2.x.
We have it vendored :) Only used on Python 2 via pycompat shim IIRC. > >> >> > But I'm not even sure I trust ProcessPoolExecutor on Windows, especially >> > when `sys.executable` is `hg.exe` instead of `python.exe`: I think both >> > multiprocessing and concurrent.futures make assumptions about how to >> > invoke the "run a worker" code on a new process that is invalidated when >> > the main process isn't `python.exe`. >> >> That's unfortunate :-/ Any way to reliably test this and get it fixed >> upstream ? >> >> > So I think we may have to roll our own "start a worker" code. The >> > solution that's been bouncing around in my head is to add a `hg >> > debugworker` command (or similar) that dispatches work read from a >> > pipe/file descriptor/temp file to a named <module>.<function> callable. >> > When then implement a custom executor conforming to the interface that >> > concurrent.futures wants and we use that for work dispatch. One of the >> > hardest parts here is implementing a fair work scheduler. There are all >> > kinds of gnarly problems involving buffering, permissions, cross >> > platform differences, etc. Even Rust doesn't have a good cross-platform >> > library for this type of message passing last time I asked (a few months >> > ago I asked and was advised to use something like 0mq, which made me >> > sad). Maybe there is a reasonable Python library we can vendor. But I >> > suspect we'll find limitations in any implementation, as this is a >> > subtly hard problem. >> >> Yeah, the problem is hard enough that I would rather have external >> library dealing with it. >> >> -- >> Pierre-Yves David
_______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel