[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756612#comment-13756612
 ] 

Jon Moore commented on HTTPCLIENT-1395:
---------------------------------------

Hi Nikola,

I agree that minimizing the number of calls to cache storage would be a useful 
improvement. I did want to note, however, that the current code does not expect 
zero latency to the cache storage layer. In fact, the memcached storage 
implementation expects the cache to be located across the network and the 
ehcache implementation expects that cache entries might be spilled to disk.

The reason there are multiple calls to the cache storage layer is explicitly 
*because* some of the processing might take extra latency and the cache may 
have been updated since we last checked it--particularly in a cache miss case 
where it is possible some *other* request filled in the cache before we did, 
while we were waiting for an origin request to complete. The caching module 
doesn't do any synchronization between requests, other than at the cache 
storage implementation, which is external. This allows multiple application 
servers (for example) to share a common cache storage (e.g. memcached farm) 
while maintaining proper HTTP caching semantics.

The cache *does* make an assumption that access to the cache storage layer is 
an order of magnitude (or more) faster than making a request to the origin. 
Remember that HTTP is designed to operate in a WAN environment. It sounds like 
in your case making 3 calls to the cache storage layer is *slower* than calling 
the origin--is that right?

In any event, I do think there are some opportunities for improvement here. In 
particular, in looking through the code again, I need to refresh my memory as 
to why, if we have a cache miss, we re-check whether there are variants present 
before calling the backend. I believe that might be the only cache lookup we 
can avoid (as the later one to check if a more recent entry exists after 
getting the backend response is necessary for proper cache behavior). If 
there's a patch to be had here, it certainly should be storage 
implementation-agnostic, as Oleg suggests.

Jon
                
> Call the storage implementation only once on a cache miss
> ---------------------------------------------------------
>
>                 Key: HTTPCLIENT-1395
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1395
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpCache
>    Affects Versions: 4.2.5
>            Reporter: Nikola Petrov
>            Priority: Minor
>             Fix For: 4.3.1
>
>         Attachments: call-storage-implementation-once-4.2-branch.patch, 
> call-storage-implementation-once.patch, 
> call-storage-implementation-once-trunk.patch
>
>
> I am trying to use the httpclient-cache component with a Cassandra backend. 
> Everything seems good except that HttpCacheStorage#getEntry is getting called 
> 3 times the first time resulting in a performance bottleneck. There might be 
> a way to handle this in the Storage implementation by caching the recently 
> queried values but I think that a better place is in the CachingHttpClient 
> class. The current code expects zero latency to the storage backend(the 
> current implementations are all memory based) but here is a patch that fixes 
> the problem. Some notes:
> * I am using the code from the 4.2.5 release(but can port the code to the 
> current trunk) 
> * test is provided in org.apache.http.impl.client.cache.TestCachingHttpClient
> * BasicHttpCache is patched to expose methods that check if the key is found 
> or if a proper variant is found - without this there is no way to say if 
> there was a real cache miss or the specific variant is missing
> * CachingHttpClient is checking if the current HttpCache implementation is 
> BasicHttpCache so it can use the new methods - I didn't want to change the 
> interface because this will add breaking changes to the API
> * This exposes the alreadyHaveNewerCacheEntry method so implementations can 
> control if the client should check for a more recent version in the cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to