[jira] [Commented] (HTTPCLIENT-1347) gzip responses doubly cached

Jon Moore (JIRA) Mon, 13 Jan 2014 18:57:36 -0800

    [ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870335#comment-13870335
 ]


Jon Moore commented on HTTPCLIENT-1347:
---------------------------------------

Hi Adam,

Sorry this has been confusing--it's also clearly been confusing for us, too 
(it's been probably almost 3 years since we've touched the variant stuff). I do 
think this part of the implementation is probably in need of a rewrite.

Before I dive into the response, I did just want to highlight one thing I 
noticed in your test code, which is that you have your clients wrapped 
CachingHttpClient around DecompressingHttpClient around DefaultHttpClient, but 
this will result in your caching unzipped responses, which you probably don't 
want. You should do DecompressingHttpClient around CachingHttpClient around 
DefaultHttpClient. However, this has been changed in 4.3, where the processing 
stack gets set up in the right order for you.

In your case, I believe you want to implement both a ResourceFactory (for the 
bodies) in conjunction with an HttpCacheStorage (for the headers). If you look 
in the BasicHttpCache, you will see that the ResourceFactory.copy method gets 
called when storing a variant entry; you could implement this as a lightweight 
clone operation (reusing a filename, or a soft link on the file system) so the 
body would only be stored once.

You are right, though, that the cache entry (headers) get stored twice. The 
reason for this is a bit of historical accident, but at the time we added 
support for processing variants, we weren't able to store request headers with 
the entries without breaking backwards compatibility on the HttpCacheEntry 
interface. This meant that when you retrieved an entry using the URL as the 
cache key, you couldn't tell if you could return it or not if it had a Vary 
header, because you didn't know what request headers had been used to fetch 
that entry in the first place, or whether they were the same as the current 
request's header. We ended up storing variants using the relevant headers and 
values along with the URL as the variant cache key so that we could tell if we 
had a matching variant or not. [Note that the contract for the HttpCacheStorage 
is pretty much a key-value store, so the storage should not really be caring 
whether the keys are URLs or not.]

So, long story short: the storage API of the caching module is probably due for 
a backwards-incompatible overhaul. In the meantime, you may be able to get most 
of what you want by treating the headers and bodies separately. Hope that helps.

> gzip responses doubly cached
> ----------------------------
>
>                 Key: HTTPCLIENT-1347
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1347
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpCache
>    Affects Versions: 4.2.5
>         Environment: ARCH Linux kernel 3.8.8-1
> node.js 0.8.22
>            Reporter: Adam Patacchiola
>             Fix For: 4.4 Final
>
>         Attachments: Screen Shot 2014-01-11 at 7.11.36 PM.png, Screen Shot 
> 2014-01-13 at 3.56.19 PM.png, Showing_entry_pointer.png, 
> httpClientCacheTest.tar.gz, httpClientTestServer.js
>
>
> Compressed responses are cached twice. 
> Run the attached server (node.js 0.8.22) and client tests. Create an "assets" 
> directory under where you are running the server and add two files named 1 
> and 2 ( < 1000000 bytes) . You will see that after the test is run the cache 
> dump output displays 2 sets of entries for each request, each containing the 
> full content length of the file.
> Changing the implementation of HttpCacheStorage updateEntry to not update non 
> existent entries (as I believe the correct implementation should do) throws 
> exceptions. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HTTPCLIENT-1347) gzip responses doubly cached

Reply via email to