[
https://issues.apache.org/jira/browse/HTTPCLIENT-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870335#comment-13870335
]
Jon Moore commented on HTTPCLIENT-1347:
---------------------------------------
Hi Adam,
Sorry this has been confusing--it's also clearly been confusing for us, too
(it's been probably almost 3 years since we've touched the variant stuff). I do
think this part of the implementation is probably in need of a rewrite.
Before I dive into the response, I did just want to highlight one thing I
noticed in your test code, which is that you have your clients wrapped
CachingHttpClient around DecompressingHttpClient around DefaultHttpClient, but
this will result in your caching unzipped responses, which you probably don't
want. You should do DecompressingHttpClient around CachingHttpClient around
DefaultHttpClient. However, this has been changed in 4.3, where the processing
stack gets set up in the right order for you.
In your case, I believe you want to implement both a ResourceFactory (for the
bodies) in conjunction with an HttpCacheStorage (for the headers). If you look
in the BasicHttpCache, you will see that the ResourceFactory.copy method gets
called when storing a variant entry; you could implement this as a lightweight
clone operation (reusing a filename, or a soft link on the file system) so the
body would only be stored once.
You are right, though, that the cache entry (headers) get stored twice. The
reason for this is a bit of historical accident, but at the time we added
support for processing variants, we weren't able to store request headers with
the entries without breaking backwards compatibility on the HttpCacheEntry
interface. This meant that when you retrieved an entry using the URL as the
cache key, you couldn't tell if you could return it or not if it had a Vary
header, because you didn't know what request headers had been used to fetch
that entry in the first place, or whether they were the same as the current
request's header. We ended up storing variants using the relevant headers and
values along with the URL as the variant cache key so that we could tell if we
had a matching variant or not. [Note that the contract for the HttpCacheStorage
is pretty much a key-value store, so the storage should not really be caring
whether the keys are URLs or not.]
So, long story short: the storage API of the caching module is probably due for
a backwards-incompatible overhaul. In the meantime, you may be able to get most
of what you want by treating the headers and bodies separately. Hope that helps.
> gzip responses doubly cached
> ----------------------------
>
> Key: HTTPCLIENT-1347
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1347
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpCache
> Affects Versions: 4.2.5
> Environment: ARCH Linux kernel 3.8.8-1
> node.js 0.8.22
> Reporter: Adam Patacchiola
> Fix For: 4.4 Final
>
> Attachments: Screen Shot 2014-01-11 at 7.11.36 PM.png, Screen Shot
> 2014-01-13 at 3.56.19 PM.png, Showing_entry_pointer.png,
> httpClientCacheTest.tar.gz, httpClientTestServer.js
>
>
> Compressed responses are cached twice.
> Run the attached server (node.js 0.8.22) and client tests. Create an "assets"
> directory under where you are running the server and add two files named 1
> and 2 ( < 1000000 bytes) . You will see that after the test is run the cache
> dump output displays 2 sets of entries for each request, each containing the
> full content length of the file.
> Changing the implementation of HttpCacheStorage updateEntry to not update non
> existent entries (as I believe the correct implementation should do) throws
> exceptions.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]