Excerpts from Yuya Nishihara's message of 2017-02-04 00:31:45 +0900: > On Fri, 3 Feb 2017 09:33:32 +0000, Jun Wu wrote: > > 1) What does the worker talk to the master via IPC? > > > > Current: worker sends "possible_dirty_repo: $REPO_PATH" after runcommand > > > > It's coupled with the (pseudo) repo object. But the master could preload > > things in correct order - first changelog, then phase, obsstore, etc. > > > > What if: worker sends "possible_dirty_index: $INDEX_PATH", > > "possible_dirty_obsstore $OBSSTORE_PATH" etc. > > > > It looks more flexible at first sight. However the dependency issue > > could be tricky to solve - for example, the obsstore requires changelog > > (and revset). It's much harder to preload obsstore without a pseudo > > repo object with repo.changelog, repo.revs available. > > > > Therefore, probably just send the repo path. > > Given this, I think pipes would be simpler than using shared memory. The > master > IO can be multiplexed by select().
If it's one pipe per worker, the code change (I tried) is too much. I think a single non-blocking pipe could be an option. > (But I don't have strong opinion to disagree with using shm.) > > > 2) Preload function signature (used by master's background thread). > > > > This function is to hash and load things. It runs in the master, and is > > basically a generator (so the "loading" part could be skipped on cache > > hit, and it's easier to make content and hash_with_content consistent, if > > they share states): > > > > @preloadregistar(something) > > def preloadfunc(...): > > yield quick_hash > > yield content, hash_with_content > > Isn't it simpler if preloadfunc() returns quick_hash and a delayed function > to build content? I feel this generator is kinda abuse and makes it less clear > where the costly part starts. I have thought about this. "Generator" is used as a lightweight process that could be terminated cheaply. It shares states (local variables) for hashing and loading, making certain things easier. Compare: @preload('index') def preloadindex(repo): f = repo.svfs('changelog.i') # "f" can be shared naturally hash = fstat(f).st_size yield hash indexdata = f.read() hash = len(indexdata) yield parseindex(indexdata), hash vs. def loadindex(f): indexdata = f.read() hash = len(indexdata) return parseindex(indexdata), hash @preload('index') def preloadindex(repo): f = repo.svfs('changelog.i') hash = repo.svfs('changelog.i').st_size loadfunc = lambda: loadindex(f) # pass "f" explicitly return hash, loadfunc Generator seems to be the shortest way to express the logic. I think it's easy to say that things after the first "yield" is costly. For readibility, I think improved form mentioned at the end of the previous mail "yield preload.foo(...)" makes things obvious. > > Problem: What's "..." ? It could be "repopath" as that's exactly what the > > worker sends to us. But that will make the preload function more > > complicated to be correct - repo has "requirements", etc and the index is > > not always simply path.join(repo_path, 'store', '00changelog.i'). > > > > Similar to 1, if we decouple from the repo object, index preloading would > > be fine: > > > > @preload('index') > > def preloadindex(indexpath): > > .... > > > > But things with dependency will have trouble: > > > > @preload('obsstore') > > def preloadobsstore(obsstorepath): > > # how to access repo.revs('....') ? > > > > Therefore, the preload function needs a repo (even it's "pseudo") to be > > convenient to implement. > > For the "...", I agree repopath isn't enough. _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel