Hi Folks, I have linked to a PDF (~2MB) that produces unprintable characters in the extracted text output. These characters seem to be associated with the first two pages of the document.
http://www.yourphp.org.uk/media/pdf/g/4/Annual_Report_0809.pdf I believe the problem is caused by at least one of the embedded fonts in the document; my debugging has shown that the strange characters are associated with Identity-H encoding and/or Type 1 (CID) fonts and (only perhaps) also the Mistral Font (KWTOGC+Mistral?). Fonts that display correctly seem to be associated with the WinAnsi encoding. I have not been able to debug further owing to the large number of deeply nested PDF objects (I don't really know anything about PDF!). Hope this is the right place to report this, if not then please let me know. Regards, Ian Smith. Free User Group in Bristol on 11th March. More info here www.gossinteractive.com/usergroupmar10 Web design and Content Management. www.twitter.com/gossinteractive Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses.
