Hi Eric, > > OK, so after running all of the atExitActions, we print out our > accumulated warnings about caches (should we worry about what > happens if one of the atexit actions fails)? > Yes, we should care about it, I will fix it.
> >> +type BadCacheList = [String] >> +{- NOINLINE _badCacheList -} >> +_badCacheList :: IORef BadCacheList >> +_badCacheList = unsafePerformIO $ newIORef [] >> + >> +addBadCache :: String -> IO () >> +addBadCache cache = modifyIORef _badCacheList (cache:) >> + >> +getBadCacheList :: IO [String] >> +getBadCacheList = readIORef _badCacheList >> + >> +{- NOINLINE isBadCache -} >> +isBadCache :: String -> Bool >> +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList >> + return (cache `elem` badCaches) > > Is it really necessary for this to be String -> Bool? > Why not String -> IO Bool? > > Alternatively, why not IO (String -> Bool)? > I used (String -> Bool) basically to escape the IO so that I could use it easily in the guards. > I'm also a bit sceptical about the need for a type synonym. > Maybe just passing around [String] is clearer here. > The synonym was to give a better insight of what we were getting, but indeed, it can be inferred from the function name. >> +type TrustedHttps = [String] >> +{- NOINLINE _trustedHttpList -} >> +_trustedHttpList :: IORef TrustedHttps >> +_trustedHttpList = unsafePerformIO $ newIORef [] > > So you maintain a list of URLs which you managed to reach the > first time you contacted them this session. > > 'Trust' is probably not the right notion to use here (since I'd > be thinking more in terms of security or something like that). > How about "reachable" instead? > "reachable" or "reliable" could be the words to express what I wanted. >> +addTrustedHttp :: String -> IO () >> +addTrustedHttp http = modifyIORef _trustedHttpList (http:) >> + >> +getTrustedHttps :: IO TrustedHttps >> +getTrustedHttps = readIORef _trustedHttpList >> + >> +{- NOINLINE isTrustedHttp -} >> +isTrustedHttp :: String -> Bool >> +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps >> + return (http `elem` trustedHttps) > > Same complaint about possibly unnecessary unsafePerformIO. I mean, we > certainly use it to achieve tracking of some global information in > Darcs, but it's something we should be be somewhat reluctant to do lest > Darcs surprise us in the future. I totally agree that I abused of unsafePerformIO, in this case I used for the same reason I gave before. >> +-- | Returns the sources of a given cache. >> +cacheSource :: CacheLoc -> String >> +cacheSource (Cache _ _ s) = s > > Potentially useful helper function. Seems like we could alternatively > modify the definition of Cache to use the curly brace syntax. > I would totally prefer curly brace syntax, do we have any reason for not using it ? >> fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, >> B.ByteString) >> -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere >> +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate >> Anywhere c h s >> + return result > > This change does not seem necessary. What was wrong with the > eta-reduced version of this code? > There is nothing wrong with it, what happened was that I was using something else in the middle, and then took it away, I forgot to rewrite it. >> sfuc [] _ = return () >> - sfuc (c:cs) out | not $ writable c = >> + sfuc (c:cs) out | (not $ badCache c) && (not $ writable c) = > >> - then speculateFileOrUrl (fn c) out >> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> >> do >> + addToBadCaches (show e) c) > >> - else copyFileOrUrl [] (fn c) out Cachable >> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` >> (\e -> do >> + addToBadCaches (show e) c) > > OK, so if something goes wrong when speculating on or copying a file, > (but it's not just the user hitting ^-c), we note the cache down as > being problematic. > If we determinate that the cause of the error is one that we think is problematic, we added. > > It seems like addToBadCaches is may be a bit of misnomer first this > function primarily tests if a cache is reachable, and second because it > adds the cache entry to either the white or the blacklist. But I'm not > sure; maybe I misunderstand the code. After all, you do only call it > when something goes wrong... > I use the whitelist basically to don't test again if the repo is reachable or not. ( Given the fact that I can get an error just because certain file is not in that repo, but the repo does exist ) > Style point: > The code below is very heavily indented, which makes it harder to read > and review without a wide screen or window. It's a tricky trade-off > between too much and not enough, so I don't really have any answers. > Perhaps this style could help here > > addToBadCaches e cache > | condition 1 = > ... > | condition 2 = > ... > Just a stupid I idea I had in my mind that I would have to do things in this way addToBadCaches e cache | condition 1 = ... | condition 2 = .... > > 2. Do we not have any exception types to work with here? Matching > on the string seems really fragile. > I agree with you, I tried to work with the exception, unfortunately "fromException" was giving nothing to this error ( I will look at it again though). > 3. Case in point, you're matching "Foo" against (map toLower "foo") > which means that this code path is never visited. I used specifically for "timeout" since the error could show "Timeout" or "timeout". >> + | isUrl (cacheSource cache) = case "timeout" `isInfixOf` (map toLower >> e) of >> + True -> addBadCache . cacheSource $ >> cache >> + False -> case isTrustedHttp (cacheSource >> cache) of >> + True -> return () >> + False -> do >> + let url = if last >> (cacheSource cache) == '/' >> + then >> cacheSource cache >> + else >> (cacheSource cache) ++ "/" >> + rsp <- simpleHTTP >> (getRequest url ) >> + do case rsp of >> + Left _ -> return >> () >> + Right response -> >> if rspCode response /= (2,0,0) >> + >> then addBadCache . cacheSource $ cache >> + >> else addTrustedHttp (cacheSource cache) >> return () > > OK earlier we see that this function is called if we receive some > exception when fetching over HTTP. > > If the exception is not a timeout, we test if the URL is reachable and > whitelist it if so (the whitelist avoids future testing). But why do we > do this? And doesn't that confuse Darcs into thinking it successfully > fetched a file when it did not? Have you tested the actual case where > you get an exception but not a timeout? > I did test it, and the reasons I add it to the white list, is to avoid inspecting again ( as Petr said). > Also, are there not web server configurations where > http://example.com/foo gives you a 404, but > http://example.com/foo/_darcs/foo gives you a result? What happens in > those cases? In other words, does the test of trying to fetch > http://example.com/foo/ actually make sense? > > Also, why is it important to normalise the URL to have a trailing slash? > Is it just to avoid duplicate entries in the whitelist? > I added the trailing slash basically because of the reason Petr said, cause some servers treats foo/ and foo differently, but as Petr suggest what I should test is if foo/_darcs/hashed_inventory exist. > Code tidiness: > > 1. It could be useful to look into reducing the amount of indentation > you use. It's a tricky trade-off between too much and not enough, > so I don't really have any answers. Perhaps this style could help here > > addToBadCaches e cache > | condition 1 = > ... > | condition 2 = > ... > > It's all about making the code as easy to read as possible. > > 2. It seems like you should just name a source = cacheSource cache, > which I think will make the code a lot simpler looking > I'll rewrite it thanks :). > > In this particular case, you've chosen to break checkSshError into > a separate function, but not the HTTP stuff. Why? > I did it just because it was getting too long ( yes, I tried to keep the columns <= 80, but I had that clumsy idea I mentioned before) > >> +-- | Given a SSH source it returns the server address appended with HTTP:// >> +-- i.e: getHttpUrl "u...@darcs.net:repo" returns "http://darcs.net" >> +getHttpUrl :: String -> String >> +getHttpUrl source = "http://" ++ ((\(_:xs) -> takeWhile (/=':') xs) . >> dropWhile (/='@') $ source) > > > Why does checkSshError take both the source and url argument? Is there > a case where url cannot be systematically derived from the source? > none, if fact the url is infere from the source. > What happens to the Either type that you return in your handler? > Doesn't it get discarded? > Yes is discarded, but in fact that's wrong, I wasn't very happy with it and I realized that it is wrong. > Why are we going about things in this roundabout way of checking the SSH > path by converting the source name to an HTTP URI and fetching it over > HTTP? What about cases where people have a working SSH server but not > necessarily an HTTP server on a given machine? > You are right here, and I realized today that this is not the way I should test that the ssh is reachable. ( I was just assuming that every server will give a 200 when fetching over HTTP, I forgot about cases like the one you mentioned). I didn't used the error message in this case, because with ssh we don't get an error like the one with http (i.e "timeout reached"), I will look better how can I handle that error, also in this patch checking for ssh I "wrongly" do it to see if it was a timeout error, but I didn't check if it was because the repo wasn't there any longer, I guess I would have to do something like "existRemoteFileSsh" and check for foo/_darcs/hashed_inventory > > Do you really use this? >> import Data.Maybe ( isJust, catMaybes ) >> import Control.Monad ( msum ) > That's not mine. I just didn't clean that up, as I think that kind of things should go in a different patch. > > (timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable > return (False,r) > `catchNonSignal` (\ e -> return (isTimeOut e,B.empty)) > > The same sort of worry as above. What happens if your exception is not > a timeout? I'm concerned that you succeed here, where what you really > want to do maybe is re-throw the exception. > If the exception is not timeout that means that It could be a possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we throw the final error. Now I have the doubt of how to go with the ssh server, I mean, how could I check that the SSH server is reachable or that the _darcs/hashed_inventory exist, a first idea come to my mind maybe with ssh, but I'm not sure of portability. Thanks ! -- Adolfo _______________________________________________ darcs-users mailing list darcs-users@darcs.net http://lists.osuosl.org/mailman/listinfo/darcs-users