Hello Julia ...

On 19 Sep 2008 at 17:43, Julia Lomonosova wrote:
> anybody can explain me why for this request I get back only empty pdf-page?
> GETPDF http://www.fd.ru/rubrika/5/1109.html
> I just want to get this page http://www.fd.ru/rubrika/5/1109.html  with
> images but I don't know how do it.
> is here any commands on www4mail to retrieve web-pages with images?

In general, GETPDF results are satisfactory only in the
following circumstances ...

* The page does not require client-side processing.
  If page display depends on Javascript or CSS, you may see no
  result at all.

* The page markup (HTML or XHTML) is 100% syntactically correct;
  parsing may fail on even the smallest error.

There is a reason for this. A PDF page dump is a visual image of
of the page. To create the image, the parser must do exactly
what a web browser does -- examine the markup, display text
according to the markup tags, retrieve and insert images at the
correct places. Advanced browsers such as Internet Explorer and
Firefox are compiled from code which contains a large proportion
of error handling and guesswork -- simply because bad markup is
so common.

Our www4mail server relies on a relatively simple markup parser.
If the markup is bad, the output is bad.

Page http://www.fd.ru/rubrika/5/1109.html seems to have numerous
syntax errors. I'm not surprised it failed.


::: SZS :::

----------------------------------------------------------------------
To contribute to the discussion, email to accmail@listserv.aol.com
To unsubscribe, email to the *admin* address [EMAIL PROTECTED]
with UNSUBSCRIBE ACCMAIL as the message body.
WWW: http://emailonly.szs.net/
----------------------------------------------------------------------

Reply via email to