Re: PDF Download problem tomcat >= 7.0.27

Rainer Jung Wed, 01 Aug 2012 10:12:39 -0700

On 01.08.2012 09:54, André Warnier wrote:

Konstantin Kolinko wrote:

2012/8/1 Jose María Zaragoza <demablo...@gmail.com>:

The Content-Length header in the above 206 response is not from Tomcat.


Tomcat's DefaultServlet does not calculate the whole size of the parts
and does not set content-length, and the file size is much more than
fits into the buffer.
So it would use  Transfer-Encoding: chunked  in its response and not
the one that you cited.
There must be some proxy in the way that buffers the data and sends
them as one response instead of chunks. HTTPD? Was there some option
in it that disables chunked encoding when interacting with IE?


Well, i don't know so much, but that doesn't have to do with chunked
encoding, but Partial Content support, right ?
And partial content is requested by client (IE) if Content-length is
very big ( I guess ... )
Maybe, IE requests a PDF file (GET) and  if it sees a Content-length
very big , cuts downloading and re-send a GET request with a range of
bytes.

Chrome looks to perform something like that behaviour


1. I suspect that the content is requested not by IE, but by the Adobe
Acrobat plugin.

The "User-Agent" header says that it was IE6,  but it is hard to
imagine why the browser by itself would request that strange bytes
range, asking for the tail of the file first. So there is something
else that uses the browser to perform the request.

+1
Talking about PDF files, there is a possible good reason for such a
behaviour.

A PDF file is not just a sequential text-like file.  It is more like an
indexed file containing tables of pointers which points to more or less
randomly organised chunks of data inside the file. And, as per Adobe PDF
1.7 reference :

3.4.4 File Trailer
The trailer of a PDF file enables an application reading the file to
quickly find the cross-reference table and certain special objects.
Applications should read a PDF file from its end. The last line of the
file contains only the end-of-file marker, %%EOF. (See implementation
note 18 in Appendix H.) The two preceding lines contain the keyword
startxref and the byte offset from the beginning of the file to the
beginning of the xref keyword in the last cross-reference section.
etc..
...
And Note 18 in Appendix H essentially says that Acrobat reader is
"tolerant" with respect to the above, and accepts a PDF if the %%EOF
marker is located within the last 1024 bytes of the file.

So, it is not beyond belief to imagine that a smart browser PDF plugin
would first request the last chunk of the file, in order to retrieve
pointers to the contents of the first page of the PDF, so that it could
quickly retrieve the range of bytes corresponding to this first page, so
that it could quickly display this first page into the browser window,
while later retrieving the rest on-demand (as the user scrolls). (*)

And if this is not the real explanation for the behaviour we are seeing,
at least it is a clever one.

Now how this all works in conjunction with the behaviour of HTTP
proxies/gateways with respect to Range requests and buffering, is left
as an exercise for the reader.
(Who can start by trying to understand
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35)
But that there would exist a couple of obscure bugs somewhere in there,
which show up only in very specific circumstances, is not beyond belief
either.


(*) The attentive reader will have noticed that there is a possible flaw
in this explanation : in the case at hand, the browser/plugin requests 2
chunks of bytes in the Range request : the end-of-file chunk, but also a
chunk in the middle.  How does it already know which second Range to
request ?

Adobe calls the range requests in the context of acrobat "fast webview". When you generate a PDF you can choose whether you want tosupport it or not. I guess that at least there will be a byte rangeindex giving the byte ranges for each page at the beginning of thedocument. Usually Acrobat then just gets the first page plus the index.If you switch to a different page, then it only loads the byte rangeneeded for that page.

How does it know the second Range? Perhaps it already did anotherrequest in front to collect all needed index data.


Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: PDF Download problem tomcat >= 7.0.27

Reply via email to