Sorry, Adolfo, I've been remiss in my reviewing duty.
Anyway, I've interspersed Eric-reads-the-code style comments with comments to the patch author. The latter are highlighted with REMARK Resolve issue 1599: automatically expire unused caches ====================================================== > + badSources <- getBadSourcesList > + when ( not $ null badSources ) $ reportBadSources badSources So here we're modifying the withAtExit function directly to report on the caches that we could not reach. REMARK: An alternative approach may be to register this as an exit action via atexit, which seems a bit cleaner/more modular to me. I guess the component that would register this would be the main function for darcs. On the other hand, I suppose one advantage of baking this into withAtExit is that we control the order of execution; this fires after all the actions have been run. Are there other reasons? If we don't care about the order of execution so much, I'd lean more towards the more modular approach > + reportBadSources sources = do > + hPutStderr $ "\nI could not reach the following " ++ > + englishNum (length sources) (Noun "repository") ":" > + hPutStderr $ (unlines sources) ++ "If you're not using " ++ itThem > ( length sources) ++ > + ", you should probably delete\nthe corresponding " ++ > + englishNum (length sources) (Noun "entry") " from > _darcs/prefs/sources." > + hPutStderr = hPutStrLn stderr > + itThem num = case num > 1 of > + False -> "it" > + True -> "them" REMARK: Nice touch on the pronoun. This could probably go into the English module. You could even make a Pronoun type which implements the Countable instance, which allows you to reuse englishNum. > +{- NOINLINE _badSourcesList -} > +_badSourcesList :: IORef [String] > +_badSourcesList = unsafePerformIO $ newIORef [] > + > +addBadSource :: String -> IO () > +addBadSource cache = modifyIORef _badSourcesList (cache:) > + > +getBadSourcesList :: IO [String] > +getBadSourcesList = readIORef _badSourcesList > + > +isBadSource :: IO (String -> Bool) > +isBadSource = do badSources <- getBadSourcesList > + return (`elem` badSources) Looks fine; similar for reachableSources > -data CacheLoc = Cache !CacheType !WritableOrNot !String > +data CacheLoc = Cache { cacheType:: !CacheType, writableOrNot:: > !WritableOrNot, source:: !String } REMARK: I tend to prefix names of field accessors, which makes the names uglier but reduces the chances I'll need to qualify them later on, for example. data CacheLoc = Cache { cacheType:: !CacheType , cacheWritable :: !WritableOrNot , cacheSource:: !String } copyFileUsingCache ------------------ > - sfuc cache stickItHere > + badSource <- isBadSource > + sfuc cache stickItHere badSource > - sfuc [] _ = return () > - sfuc (c:cs) out | not $ writable c = > + sfuc [] _ _ = return () > + sfuc (c:cs) out badSource | not (badSource (source c)) && not > (writable c) = > if oos == OnlySpeculate This new version of sfuc makes use of the list of unreachable sources. REMARK: Perhaps another approach you could take is to just apply a filter on the cache, maybe changing it to case filter (\c -> not (badSource c || writable c) cs of [] -> return () (c:_) -> sfuc c stickItHere Personally, I'd just drop the guards and use an if-then-else for clarity > hunk ./src/Darcs/Repository/Cache.hs 222 > - then speculateFileOrUrl (fn c) out > + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> > + checkCacheReachability (show e) c) > - else copyFileOrUrl [] (fn c) out Cachable > - | otherwise = sfuc cs out > + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` > (\e -> > + checkCacheReachability (show e) c) OK, so the other change being that if something should go wrong when speculating or trying to copy a file, you check to see if the cache is alright. > + | otherwise = sfuc cs out badSource REMARK: The indentation for this otherwise case is now a bit off, which makes the code harder to understand. But it would be a moot point if you went with the filter approach. checkCacheReachability ----------------------- > +-- | Checks if a given cache entry is reachable or not. > +-- It receives an error caught during execution and the cache entry. > +-- For a local cache, if the given source doesn't exist anymore, it is added. > +-- For HTTP sources if the error is timeout, it is added, if not we check > for the > +-- _darcs/hashed_inventory file, if doesn't exist it means we are pointing > to a repository > +-- which used to exist there, but had been moved. > +-- For SSH if we get an error we try to get the file _darcs/hashed_inventory > again, if it fails > +-- we add the entry to the list of sources which are no reachables. > +-- The entries which get added to the cache are no longer tried for the rest > of the command. > +checkCacheReachability :: String -> CacheLoc -> IO () > +checkCacheReachability e cache > + | isFile (source cache) = do > + reachable <- isReachableSource > + unless (reachable (source cache)) $ do > + exist <- doesDirectoryExist $ source cache > + unless exist $ addBadSource $ source cache > + when exist $ addReachableSource $ source cache If a source has already been whitelisted, we don't need to check it; otherwise, otherwise either whitelist or blacklist it depending on whether the directory exists. REMARK: I think you can share the source cache expression using a where clause which will be common to all the guards. REMARK: These last two cases look like they could be more simply expressed using an if-then-else if exist then addReachableSource (source cache) else addBadSource (source cache) > + | isUrl (source cache) = do > + reachable <- isReachableSource > + let string = case dropWhile (/='(') e of > + (_:xs) -> fst (break (==')') xs) > + _ -> e > + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of > + [(ce,_)] -> Just ce > + _ -> Nothing REMARK: We're still parsing error strings here, which seems rather fragile to me. Weren't we going to capture status codes from curl or something? It's an improvement at least that the error code parsing is in one place and not scattered throughout the darcs source. > + if cerror /= Nothing > + then addBadSource $ source cache > + else > + unless (reachable (source cache)) $ > + withTemp $ \tempout -> do > + let f = source cache ++ "/" ++darcsdir ++ "/" ++ > "hashed_inventory" > + copyFileOrUrl [] f tempout Cachable > + addReachableSource $ source cache > + `catchNonSignal` (\_ -> addBadSource $ source cache) So the logic here is: If we get a connection error of some sort, then we add the cache to the blacklist, even if it was already previously whitelisted. If it's not a connection error that we got, and we have not already whitelisted the server, then we test for reachability by trying to fetch the hashed_inventory file. If the server is indeed reachable it is whitelisted so we don't bother checking again; otherwise, we add it to the blacklist so we know not to fetch from it again. REMARK: It seems like you have a situation where an item could be added to the blacklist even though it's already whitelisted. Is that OK? Could there be some sort of unpredictable behaviour when an entry is in both lists? Perhaps one way to think about this -- not necessarily the right way, mind you -- is that * If an entry is in the whitelist, it can be added to the blacklist (but then it's removed from the whitelist). This would apply to cases where a server goes down in the middle of a fetch. * If an entry is in the blacklist, and you try to add it to the whitelist, we have an error (bug in darcs...) REMARK: we're baking in the assumption that hashed repositories have a hashed_inventory file... hopefully this sort of baking it won't bite us in the long run. I guess there's not much we can do about this. REMARK: You may find the isJust function to be useful. > + | isSsh (source cache) = > + withTemp $ \tempout -> do > + let f = source cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory" > + copyFileOrUrl [] f tempout Cachable > + addReachableSource $ source cache > + `catchNonSignal` (\_ -> addBadSource $ source cache) REMARK: It looks here like you have some duplication with the HTTP code, maybe you could refactor this into a helper function like checkRemote. REMARK: You seem to have omitted the case where the cache entry is already whitelisted. (Seems like that logic could go into checkRemote) > + | otherwise = fail $ "unknown transport protocol for: " ++ source cache fetchFileUsingCachePrivate -------------------------- REMARK: What's the difference between copyFileUsingCache and fetchFileUsingCachePrivate? I think it's useful if you know, since you're modifying these two functions. > - ffuc cache > + badSource <- isBadSource > + ffuc cache badSource > - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir > subdir)++ > + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir > subdir ++ REMARK: This should be in a separate patch. (Don't worry too much about this, just being super-picky about minimal patches) > hunk ./src/Darcs/Repository/Cache.hs 286 > - where ffuc (c:cs) > - | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) = > + where ffuc (c:cs) badSource > + | not (writable c) && (Anywhere == fromWhere || isFile (fn c)) && > not (badSource (source c)) = REMARK: Similar remark as with copyFileUsingCache. Maybe we could just filter out the bad sources and leave the ffuc logic intact. > do debugMessage $ "In fetchFileUsingCachePrivate I'm going > manually" > debugMessage $ " getting "++f > debugMessage $ " from " ++ fn c > hunk ./src/Darcs/Repository/Cache.hs 299 > fail $ "Hash failure in " ++ fn c > return (fn c, x') > else return (fn c, x) -- FIXME: create links in caches > - `catchall` ffuc cs > + `catchNonSignal` (\e -> do > + checkCacheReachability (show e) c > + foo <- isBadSource OK, so if something goes wrong during fetchFileUsingCachePrivate, we need to checkCacheReachability again. This can modify the white and black lists so we need to pass an updated isBadSource. REMARK: wait a second, why does do we need to care about the updated isBadSource? It's not like sources reappear in the list, do they? REMARK: Actually naming your variables 'foo' is probably a bad idea (particularly since foo tends to be used casually to refer to some hypothetical variable, not an actual one) > - `catchall` do (fname,x) <- ffuc cs > - do createCache c subdir > - createLink fname (fn c) > - return (fn c, x) > - `catchall` > - do gzWriteFilePS (fn c) x `catchall` return () > - return (fname,x) > - | otherwise = ffuc cs > + `catchNonSignal` (\ e -> > + do > + checkCacheReachability (show e) c > + foo <- isBadSource > + (fname,x) <- ffuc cs foo > + do createCache c subdir > + createLink fname (fn c) > + return (fn c, x) > + `catchall` > + do gzWriteFilePS (fn c) x `catchall` > return () > + return (fname,x)) Just a similar change; if something goes wrong we update the black/white lists accordingly and try from the subsequent cache. REMARK: I'm still worried about this moving from catchall to catchNonSignal... Are we relying on any of these handlers doing something clever when the user kills darcs ? > - where handler' se = > + where handler' se = do REMARK: this noise does not belong in the patch :-) HTTP module ----------- > +data ConnectionError = CouldntResolveHost | > + CouldntConnectToServer | > + OperationTimeout > + deriving (Eq, Read, Show) REMARK: I would avoid the contraction here and call it CouldNotResolveHost, etc. I can't really say why; I guess some part of me feels it's clearer that way, or some part of me wants ADTs and haddocks to be fairly formal. > -waitNextUrl :: IO (String, String) > +waitNextUrl :: IO (String, String, Maybe ConnectionError) REMARK: It may be worth thinking about just replacing that second String with Maybe ConnectionError... (not sure how wise this is in practice) > #ifdef HAVE_HTTP > > hunk ./src/HTTP.hs 96 > waitNextUrl = do > (u, f) <- readIORef requestedUrl > if null u > - then return ("", "No URL requested") > + then return ("", "No URL requested", Nothing) > else do writeIORef requestedUrl ("", "") > e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return > "") `catch` h > hunk ./src/HTTP.hs 99 > - return (u, e) > + let ce = case e of > + "timeout" -> Just OperationTimeout > + _ -> Nothing It looks like this e string is always an error. REMARK: I guess there's not much more than trust the strings here. Have you tested this with the HTTP module? URL module ---------- Phew, this seems a bit confusing. I think the URL module contains both high-level code and curl-specific code which is #ifdef'ed out. Seems like the low-level Curl code should be moved to its own module. > + > + REMARK: another irrelevant change, grumble :-P > hunk ./src/URL.hs 223 > let l = pipeLength st > when (l > 0) $ do > dbg "URL.waitNextUrl start" > - (u, e) <- liftIO $ waitNextUrl' > + (u, e, ce) <- liftIO $ waitNextUrl' > let p = inProgress st > new_st = st { inProgress = Map.delete u p > , pipeLength = l - 1 } > hunk ./src/URL.hs 238 > else case Map.lookup u p of > Just (f, _, _) -> do > removeFileMayNotExist > (f++"-new_"++randomJunk st) > - downloadComplete u e > + case ce of > + Just httpError -> downloadComplete u > (show httpError) > + Nothing -> downloadComplete u e > debugMessage $ "URL.waitNextUrl failed: "++ > u++" "++f++" "++e > Nothing -> bug $ "Another possible bug in > URL.waitNextUrl: "++u++" "++e This is the high-level waitNextUrl, extended to make use of high-level errors or fall back to string type errors in our reporting. REMARK: does this mean the two cannot co-exist? If not, the type signature of waitNextUrl' could probably reflect somehow, maybe using Either or another type like HttpStatus = HttpOK | HttpConnectionError ConnectionError | HttpOtherError String > - unless (null e) (debugFail $ "Failed to download URL > "++u++": "++e) > + unless (null e) $ do > + debugMessage $ "Failed to download URL "++u++": > "++e > + fail e REMARK: irrelevant change (I understand this may be a bit of a pain, but any effort you could put into making your patches easy to understand pays off) darcs revert and darcs record are interactive... (although if you find them a bit of a pain to use, there could a UI discussion there...) > -waitNextUrl' :: IO (String, String) > +waitNextUrl' :: IO (String, String, Maybe ConnectionError) This looks like the curl version of waitNextUrl' > waitNextUrl' = do > e <- curl_wait_next_url >>= peekCString > + ce <- if not (null e) > + then do > + errorNum <- curl_last_error_num > + case errorNum of > + 6 -> return $ Just CouldntResolveHost > + 7 -> return $ Just CouldntConnectToServer > + 29 -> return $ Just OperationTimeout > + _ -> return Nothing > + else > + return Nothing Hmm, I'll take this on faith. REMARK: potential minor refactor return $ case errorNum of 6 -> Just CouldntResolveHost 7 -> Just CouldntConnectToServer 29 -> Just OperationTimeout _ -> return Nothing > +foreign import ccall "hscurl.h curl_last_error_num" > + curl_last_error_num :: IO CInt Exciting; I keep avoiding the FFI for some reason. hscurl.c -------- > +static CURLcode errornum = 0; > > const char *curl_wait_next_url() > { > + errornum = -1; > + > > - if (result != CURLE_OK) > + if (result != CURLE_OK){ > + errornum = result; > return curl_easy_strerror(result); > + } REMARK: Hmm, this sort of stateful way of doing things tends to make me nervous (also, do we need to worry about what happens when you call wait_next_url several times in parallel?). Could we not return a tuple instead? I guess we'd have to find a way to deal with structs in the FFI? -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
pgpZgeY0t7kkz.pgp
Description: PGP signature
_______________________________________________ darcs-users mailing list darcs-users@darcs.net http://lists.osuosl.org/mailman/listinfo/darcs-users