Summary: after looking at things I still prefer making the copy/link discrimination in copyLocal (i.e. at the bottom level) and ensuring that [DarcsFlag] is passed on down through all paths even though this has the "splatter" effect on many functions.
If after reading my comments below anyone would still like to see an actual implementation of Eric's alternative proposal then let me know, but if the comments below suffice then I don't need to spend the time... > If I'm reading the code right, the hard-link code is used for at least > two things: > > * fetchFile/gzFetchFile - links/copy/downloads a file into a temporary > directory > > * copy_repo_patches - copying the patch files > > Now, would you agree that only the latter is important? (Anyway, > linking here seems unlikely for the former, as /tmp is usually on a > separate partition). If that is the case, maybe we could get the same > benefits by having two versions of copyFileOrUrl, one of which supports > NoLinks. As it turns out, fetchFile and gzFetchFile are both used to read the specified file into local memory, and if their argument is a file, they will do so directly. They only call copyFileOrUrl if they know it isn't a file, meaning these will never end up invoking copyLocal. Thus, these paths never involve a link v.s. copy decision. I *could* reduce the splatter a bit by not requiring [DarcsFlag] to be an argument to fetchFile/gzFetchFile (internally supplying a null array to copyFileOrUrl), but I that (a) assumes that no other [DarcsFlag] might eventually be useful down below, and (b) silently replaces actual [DarcsFlag] values with something else, which sounds like a bug hunt nightmare at some future point. So as long as there's splatter, I'd include these functions. > > The patch Kevin made is pretty straightforward stuff, just modify > copyLocal to track [DarcsFlag]. The only annoying thing is that as a > consequence, a lot of unrelated functions (see below) are getting > splattered with the extra argument by their association with copyLocal > > Perhaps having *two* copyFileOrUrl functions would be a lesser evil than > splattering all those poor little functions (*) What do you (all) think? ** There are several paths that end up at copyFileOrUrl/copyLocal. Not a lot, but having the determination made at upper levels seems to add clutter and decisions about low-level issues that don't really need to belong at those levels. And future darcs development would have to remember to re-implement this stuff as appropriate, with insidious bugs (i.e. non-obvious, corner case) resulting if it didn't. For example, HashedRepo.lhs calls copyLocal in write_either_inventory, which has a couple of paths that lead to there; write_either_inventory would need the copy/link logic and unfortunately it doesn't have [DarcsFlag] info so it would need some splatter as well. Since the copyLocal seems to work just fine, there's nothing to indicate that anything is wrong until ultimately a user notices that their inventory file(s) are links even though the --nolinks flag was used. ** Many of the functions in External.hs (which is where copyLocal exists) already have [DarcsFlag] arguments, including--ironically enough--copyLocals. Thus needing/applying [DarcsFlag] information at that level isn't unusual. ** The [DarcsFlag] is really an environment (ReaderT monad anyone?) that describes how darcs should work overall. Anything which might be affected by an environment setting will need access to to that environment. Sorry, this is sounding pedantic, but allowing the information to flow through to where it's used seems most appropriate. As an aside, it might be interesting to replace the IO monad usage within darcs with a more abstract monad (e.g. "DarcsIO") which would allow monad stacking like adding a ReaderT to hold [DarcsFlag] and maybe other things like a StateT for the main repo or WriterT for accumulating output changes. I'm not sure if this would improve things or not: it might reduce the number of arguments passed around, simplify existing structures (e.g. no need to embed [DarcsFlag] in the Repository objects), and remove the potential for future splatter, but on the other hand it would require familiarity with those monadic styles which might be more obscure instead. Comments anyone? > Moreover, I believe copyFileOrUrl has a dangerously misleading name > because it does not make clear the fact that file could potentially > be linked. ... > ... I was thinking that we could mirror the copy/clone distinction and > have a function cloneFileOrUrl. ... Unfortunately, cloneX is used quite a bit in External.hs and it has nothing to do with links, so at a minimum some other name would need to be used. Personally, I'd rather have one function whose name indicated its overall function and look in that function for the details. Having multiple functions with slightly different names and slightly different functionality but which all appear to act the same from a high-level is more confusing to me. > > * Eric, I think you are also correct in that only darcs get is actually > > affected by --nolinks. This seems to be because push/pull will read > > the patches into memory, perform various commutation analysis and > > whatnot, and then write them as new files to the output repository. > > I'm not sure what to think of this. > > I would be inclined to taking the arguments out, and adding back should > the performance enhancements be made. I appreciate the need for > consistency. I just worry that it might mislead the user in building > the wrong mental model of darcs (albeit not fundmentally wrong as darcs > could do it in principle). I'd prefer to have the user build a mental model that darcs is really good at things and then try to make the implementation match that model. :-) It turns out that even darcs get will only create links when doing an "old-style" copy; if you have a HashedRepo it will actually read every remote patch into memory and write it as a separate operation (assuming I'm reading the code correctly). My hunch is that this is a slow way to do things and that we should at least be using System.Directory.copyFile if we can't link files. This lets the implementation of System do the most optimal file copying method available and frees up darcs memory for more important stuff like patch commutation and merging and whatnot (and avoiding GC cycles). As always, I appreciate the review and feedback, and the above is simply the results of additional investigation and opinionation on my part. Let me know what does and doesn't sound good and which way you'd like me to proceed. -- Kevin Quick quick at org after sparq
pgpcF5v5vRsqD.pgp
Description: PGP signature
_______________________________________________ darcs-devel mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-devel
