Re: [darcs-users] [patch310] Resolve issue 1599: automatically expire unused caches

Adolfo Builes Thu, 22 Jul 2010 18:39:08 -0700

Hi Eric,

>
> OK, so after running all of the atExitActions, we print out our
> accumulated warnings about caches (should we worry about what
> happens if one of the atexit actions fails)?
>
Yes, we should care about it, I will fix it.


>
>> +type BadCacheList = [String]
>> +{- NOINLINE _badCacheList -}
>> +_badCacheList :: IORef BadCacheList
>> +_badCacheList = unsafePerformIO $ newIORef []
>> +
>> +addBadCache :: String -> IO ()
>> +addBadCache cache = modifyIORef _badCacheList (cache:)
>> +
>> +getBadCacheList :: IO [String]
>> +getBadCacheList = readIORef _badCacheList
>> +
>> +{- NOINLINE isBadCache -}
>> +isBadCache :: String -> Bool
>> +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList
>> +                                        return (cache `elem` badCaches)
>
> Is it really necessary for this to be String -> Bool?
> Why not String -> IO Bool?
>
> Alternatively, why not IO (String -> Bool)?
>
I used  (String -> Bool) basically to escape the IO so that I could
use it easily in the guards.

> I'm also a bit sceptical about the need for a type synonym.
> Maybe just passing around [String] is clearer here.
>
The synonym was to give a better insight of what we were getting, but
indeed, it can be inferred from the function name.

>> +type TrustedHttps = [String]
>> +{- NOINLINE _trustedHttpList -}
>> +_trustedHttpList :: IORef TrustedHttps
>> +_trustedHttpList = unsafePerformIO $ newIORef []
>
> So you maintain a list of URLs which you managed to reach the
> first time you contacted them this session.
>
> 'Trust' is probably not the right notion to use here (since I'd
> be thinking more in terms of security or something like that).
> How about "reachable" instead?
>
"reachable" or "reliable" could be the words to express what I wanted.


>> +addTrustedHttp :: String -> IO ()
>> +addTrustedHttp http = modifyIORef _trustedHttpList (http:)
>> +
>> +getTrustedHttps :: IO TrustedHttps
>> +getTrustedHttps = readIORef _trustedHttpList
>> +
>> +{- NOINLINE  isTrustedHttp -}
>> +isTrustedHttp :: String -> Bool
>> +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps
>> +                                          return (http `elem` trustedHttps)
>
> Same complaint about possibly unnecessary unsafePerformIO.  I mean, we
> certainly use it to achieve tracking of some global information in
> Darcs, but it's something we should be be somewhat reluctant to do lest
> Darcs surprise us in the future.
I totally agree that I abused of unsafePerformIO, in this case I used
for the same reason I gave before.

>> +-- | Returns the sources of a given cache.
>> +cacheSource :: CacheLoc -> String
>> +cacheSource (Cache _ _ s) = s
>
> Potentially useful helper function.  Seems like we could alternatively
> modify the definition of Cache to use the curly brace syntax.
>
I would totally prefer curly brace syntax, do we have any reason for
not using it ?

>>  fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, 
>> B.ByteString)
>> -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere
>> +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate 
>> Anywhere c h s
>> +                               return result
>
> This change does not seem necessary.  What was wrong with the
> eta-reduced version of this code?
>
There is nothing wrong with it, what happened was that I was using
something else in the middle, and then took it away, I forgot to
rewrite it.


>>            sfuc [] _ = return ()
>> -          sfuc (c:cs) out | not $ writable c =
>> +          sfuc (c:cs) out | (not $ badCache c) && (not $ writable c) =
>
>> -                 then speculateFileOrUrl (fn c) out
>> +                 then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> 
>> do
>> +                                       addToBadCaches (show e) c)
>
>> -                 else copyFileOrUrl [] (fn c) out Cachable
>> +                 else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` 
>> (\e -> do
>> +                                       addToBadCaches (show e) c)
>
> OK, so if something goes wrong when speculating on or copying a file,
> (but it's not just the user hitting ^-c), we note the cache down as
> being problematic.
>
If we determinate that the cause of the error is one that we think is
problematic, we added.

>
> It seems like addToBadCaches is may be a bit of misnomer first this
> function primarily tests if a cache is reachable, and second because it
> adds the cache entry to either the white or the blacklist.  But I'm not
> sure; maybe I misunderstand the code.  After all, you do only call it
> when something goes wrong...
>
I use the whitelist basically to don't test again if the repo is
reachable or not. ( Given the fact that I can get an error just
because certain file is not  in that repo, but the repo does exist )

> Style point:
> The code below is very heavily indented, which makes it harder to read
> and review without a wide screen or window.  It's a tricky trade-off
> between too much and not enough, so I don't really have any answers.
> Perhaps this style could help here
>
>   addToBadCaches e cache
>    | condition 1 =
>        ...
>    | condition 2 =
>        ...
>
Just a stupid I idea I had in my mind that I would have to do things in this way
addToBadCaches e cache
                                          | condition 1 =
                                                   ...
                                          | condition 2 =
                                                  ....


>
> 2. Do we not have any exception types to work with here?  Matching
>   on the string seems really fragile.
>
I agree with you, I tried to work with the exception, unfortunately
"fromException" was giving nothing to this error ( I will look at it
again though).

> 3. Case in point, you're matching "Foo" against (map toLower "foo")
>   which means that this code path is never visited.
I used specifically for "timeout" since the error could show "Timeout"
or "timeout".

>> +  | isUrl  (cacheSource cache) = case "timeout" `isInfixOf`  (map toLower 
>> e) of
>> +                                   True  -> addBadCache . cacheSource $ 
>> cache
>> +                                   False -> case isTrustedHttp (cacheSource 
>> cache) of
>> +                                            True  -> return  ()
>> +                                            False -> do
>> +                                                      let url = if last 
>> (cacheSource cache) == '/'
>> +                                                                 then 
>> cacheSource cache
>> +                                                                 else 
>> (cacheSource cache) ++ "/"
>> +                                                      rsp <- simpleHTTP 
>> (getRequest url )
>> +                                                      do case rsp of
>> +                                                          Left _ ->  return 
>> ()
>> +                                                          Right response -> 
>> if rspCode response /= (2,0,0)
>> +                                                                            
>> then addBadCache . cacheSource $ cache
>> +                                                                            
>> else addTrustedHttp (cacheSource cache) >> return ()
>
> OK earlier we see that this function is called if we receive some
> exception when fetching over HTTP.
>
> If the exception is not a timeout, we test if the URL is reachable and
> whitelist it if so (the whitelist avoids future testing).  But why do we
> do this?  And doesn't that confuse Darcs into thinking it successfully
> fetched a file when it did not?  Have you tested the actual case where
> you get an exception but not a timeout?
>
I did test it, and the reasons I add it to the white list, is to avoid
inspecting again ( as Petr said).

> Also, are there not web server configurations where
> http://example.com/foo gives you a 404, but
> http://example.com/foo/_darcs/foo gives you a result?  What happens in
> those cases?  In other words, does the test of trying to fetch
> http://example.com/foo/ actually make sense?
>
> Also, why is it important to normalise the URL to have a trailing slash?
> Is it just to avoid duplicate entries in the whitelist?
>
I added the trailing slash basically because of the reason Petr said,
cause some servers treats foo/ and foo differently, but as Petr
suggest what I should test is if foo/_darcs/hashed_inventory exist.


> Code tidiness:
>
> 1. It could be useful to look into reducing the amount of indentation
>   you use.  It's a tricky trade-off between too much and not enough,
>   so I don't really have any answers.  Perhaps this style could help here
>
>   addToBadCaches e cache
>    | condition 1 =
>        ...
>    | condition 2 =
>        ...
>
>   It's all about making the code as easy to read as possible.
>
> 2. It seems like you should just name a source = cacheSource cache,
>   which I think will make the code a lot simpler looking
>
I'll rewrite it thanks :).


>
> In this particular case, you've chosen to break checkSshError into
> a separate function, but not the HTTP stuff.  Why?
>
I did it just because it was getting too long ( yes, I tried to keep
the columns <= 80, but I had that clumsy idea I mentioned before)

>
>> +-- | Given a SSH source it returns the server address appended with HTTP://
>> +-- i.e: getHttpUrl "u...@darcs.net:repo" returns "http://darcs.net";
>> +getHttpUrl :: String -> String
>> +getHttpUrl source = "http://"; ++ ((\(_:xs) -> takeWhile (/=':') xs) . 
>> dropWhile (/='@') $ source)
>
>
> Why does checkSshError take both the source and url argument?  Is there
> a case where url cannot be systematically derived from the source?
>
none, if fact the url is infere from the source.

> What happens to the Either type that you return in your handler?
> Doesn't it get discarded?
>
Yes is discarded, but in fact that's wrong, I wasn't very happy with
it and I realized that it is wrong.

> Why are we going about things in this roundabout way of checking the SSH
> path by converting the source name to an HTTP URI and fetching it over
> HTTP?  What about cases where people have a working SSH server but not
> necessarily an HTTP server on a given machine?
>
You are right here, and I realized today that this is not the way I
should test that the ssh is reachable. ( I was just assuming that
every server will give a 200 when fetching over HTTP, I forgot about
cases like the one you mentioned).  I didn't used the error message in
this case, because with ssh we don't get an error like the one with
http (i.e "timeout reached"), I will look better how can I handle that
error, also in this patch checking for ssh I "wrongly" do it to see if
it was a timeout error, but I didn't check if it was because the repo
wasn't there any longer, I guess I would have to do something like
"existRemoteFileSsh" and check for foo/_darcs/hashed_inventory

>
> Do you really use this?
>>  import Data.Maybe ( isJust, catMaybes )
>>  import Control.Monad ( msum )
>
That's not mine. I just didn't clean that up, as I think that kind of
things should go in a different patch.

>
>     (timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable
>                         return (False,r)
>                      `catchNonSignal` (\ e -> return (isTimeOut e,B.empty))
>
> The same sort of worry as above.  What happens if your exception is not
> a timeout?  I'm concerned that you succeed here, where what you really
> want to do maybe is re-throw the exception.
>

If the exception is not timeout that means that It could be a
possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we
throw the final error.

Now I have the doubt of how to go with the ssh server, I mean, how
could I check that the SSH server is reachable or that the
_darcs/hashed_inventory exist, a first idea come to my mind maybe with
ssh, but I'm not sure of portability.

Thanks !

--
Adolfo
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] [patch310] Resolve issue 1599: automatically expire unused caches

Reply via email to