[ 
https://issues.apache.org/jira/browse/DROIDS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935487#action_12935487
 ] 

Javier Puerto commented on DROIDS-105:
--------------------------------------

Sorry Paul, I'm getting an unresolved dependency after patching. Seems like you 
forget to add the commons collections to pom.xml.

Therefore I review your patch and I think that for cache implementation should 
be better to extend the default client. What do you think?

You can always call to super to get the cached content but it also allow us to 
implement another ways for caching, for example (based on your patch):

MemCacheLoader extends HttpClientContentLoader {

....
  public InputStream load(URI uri) throws IOException {
    if (contentCache == null || !contentCache.containsKey(uri)) {
      InputStream toBeCached = super.load(uri);
      
      ... Do the caching stuff ...
    }
    return cachedContent;
  }

> missing caching for robots.txt
> ------------------------------
>
>                 Key: DROIDS-105
>                 URL: https://issues.apache.org/jira/browse/DROIDS-105
>             Project: Droids
>          Issue Type: Improvement
>          Components: core
>            Reporter: Paul Rogalinski
>         Attachments: Caching-Support-and-Robots_txt-fix.patch, 
> CachingContentLoader.java
>
>
> the current implementation of the HttpClient will not cache any requests to 
> the robots.txt file. While using the CrawlingWorker this will result in 2 
> requests to the robots.txt (HEAD + GET) per crawled URL. So when crawling 3 
> URLs the target server would get 6 requests for the robots.txt.
> unfortunately the contentLoader is made final in HttpProtocol, so there is no 
> possibility to replace it with a caching Protocol like that one you'll find 
> in the attachment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to