0.95: Metadata is not valid XML document even if there are no metadata defined in FO source

Stepan RYBAR Wed, 05 Nov 2008 07:29:32 -0800

Hello,

I have a problem with metadada part of PDF created by Apache FOP 0.95 (using 
Sun Java SE 1.6.0_10 on MS Windows 2000). Although resulted PDF looks OK in 
Acrobat Reader, I am not able to process resulted PDF using iText 2.1.3. It 
throws Exception shown below. After several e-mails among iText developers (and 
me), it looks like even if I do not specify any metadata in FO, FOP 0.95 
produces PDF with compressed not valid XML metadata section according to iText 
developers. If I specify metadata according to Your example at


http://xmlgraphics.apache.org/fop/0.95/metadata.html#xmp-example

there is the same Exception. So, could You, please, verify, if the problem is 
really caused by FOP and if yes, to repait it? Thank You very much in advance.

Stepan Rybar

> ------------ Původní zpráva ------------
> Od: Paulo Soares <>
> Předmět: Re: [iText-questions] iText 2.1.3: Invalid byte 1 of 1-byte UTF-8
> sequence Exception while processing PDF from Apache FOP 0.95
> Datum: 05.11.2008 15:44:00
> ----------------------------------------
> The metadata can be compressed, the problem is that the metadata is not a 
> valid
> xml document (or at least it wasn't in the pdf I loooked at). iText should
> ignore the error and carry on but it currently doesn't. The problem starts at
> FOP in any case.
>
> Paulo
>
> ________________________________________
> From: 1T3XT info []
> Sent: Wednesday, November 05, 2008 2:08 PM
> To: Post all your questions about iText here
> Subject: Re: [iText-questions] iText 2.1.3: Invalid byte 1 of 1-byte UTF-8
> sequence Exception while processing PDF from Apache FOP 0.95
>
> Stepan RYBAR wrote:
> > Hello,
> >
> > thank You answer. But after some tests I guess, that this is not caused by
> missing metadata.
>
> Think again.
>
> > Even if there are no metadata in "source.fo"
>
> Look at your PDF document.
> In the root dictionary, there's a reference to XMP data.
> /Metadata 7 0 R
> If we have a look at object 5, we see:
> 7 0 obj
> <<
>    /Type /Metadata
>    /Subtype /XML
>    /Length 12 0 R
>    /Filter /FlateDecode
>  >>
> stream
> xoe?QN~jÂ0?}?ü++?×?I??3¤.a){q??÷\"??oem?#"?Ú?öIu^...??
> W+OE=?îÍ9'ç"#?u.Ë=X´...?]Ä?óý?#bü?Íé1/4šÁNß7?VÍA~Z6{9Qxšô{ýž¨y}¨?`s?
> ?žu`2ÍÂ~ÂQ?]Y]?¨Ýómy^(2)1?-^^(1)}'f?0G?'l8?¤s?X?T'
> &^(1)s?wEUR^(2)Ç?o`?D++"E`ë?Ë(c)HÇü/yÜ/[EMAIL 
> PROTECTED]'^(1)?ôvA`?EURFk6âa~?ÎFW"qJ?é0y"é§$W^?B+x
> a~?3/4äî?ÁO~ëE`U`s??!gÍ??.}??...kU^?Z?
>
> Oops... What's this?
> This is a compressed XMP stream.
> That's not allowed in PDF!
>


# ------------ Původní zpráva ------------
# Od: Stepan RYBAR <>
# Předmět: Re: [iText-questions] iText 2.1.3: Invalid byte 1 of 1-byte UTF-8
# sequence Exception while processing PDF from Apache FOP 0.95
# Datum: 05.11.2008 13:52:34
# ----------------------------------------
# Hello,
#
# thank You answer. But after some tests I guess, that this is not caused by
# missing metadata. Even if there are no metadata in "source.fo", I can use 
iText
# to print to stdout total number of pages. So it means, that iText can read
# "source.pdf" without throwing any exception. See attached "source.fo" 
(commented
# metadata), "source.pdf" (commented creation of output) and Java code for 
iText,
# which passes without problem.
#
# Once I want to create output file and save it without any modification (to
# prevent mistake in font encoding setting), there is an Exception:
#
# ExceptionConverter:
# com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
# Invalid byte 1 of 1-byte UTF-8 sequence.
#
# Stepan Rybar
#
# > ------------ Původní zpráva ------------
# > Od: Paulo Soares <>
# > Předmět: Re: [iText-questions] iText 2.1.3: Invalid byte 1 of 1-byte UTF-8
# > sequence Exception while processing PDF from Apache FOP 0.95
# > Datum: 04.11.2008 17:23:03
# > ----------------------------------------
# > The metadata generated by FOP is broken. They have:
# >
# > <x:xmpmeta>
# >
# > instead of:
# >
# > <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-702">
# >
# > The namespace is not defined and the parser complains about it.
# >
# > Paulo
# > ________________________________________
# > From: Stepan RYBAR []
# > Sent: Tuesday, November 04, 2008 4:11 PM
# > To: [EMAIL PROTECTED]
# > Subject: [iText-questions] iText 2.1.3: Invalid byte 1 of 1-byte UTF-8
# sequence
# > Exception while processing PDF from Apache FOP 0.95
# >
# > Hello,
# >
# > I am trying to use iText for adding page numbers to the already existed PDF
# > using PDFStamper class. I have problem. PDFs, which I create using Apache 
FOP
# > 0.95, cause Exception as shown below, although they look OK in Adobe Acrobat
# > Reader 8.1.2. I guess, that this is problem of missing or wrong encoding
# > somewhere (but where?). Attached to this e-mail are: FO source, resulted 
PDF,
# > Java code for iText. I am using Sun Java SE 1.6.0_10 on MS Windows 2000.
# Please,
# > can You point me, where I am making miskate?
# >
# > Thank You. Stepan
# >
# > L:\Documents\Capitol\vypisyKv1\_other>L:\RunFiles\jdk\jre\bin\java -cp
# > ".;iText-
# > 2.1.3.jar" AddPageNumbersToExistingPageNumberPDF
# > ExceptionConverter:
# > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequ
# > enceException: Invalid byte 1 of 1-byte UTF-8 sequence.
# >         at
# > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unk
# > nown Source)
# >         at 
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown
# > So
# > urce)
# >         at
# > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown
# >  Source)
# >         at
# > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapac
# > ity(Unknown Source)
# >         at
# > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(U
# > nknown Source)
# >         at
# > com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineD
# > ocVersion(Unknown Source)
# >         at
# > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(U
# > nknown Source)
# >         at
# > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(U
# > nknown Source)
# >         at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
# > So
# > urce)
# >         at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
# > So
# > urce)
# >         at
# > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unk
# > nown Source)
# >         at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
# >         at com.lowagie.text.xml.xmp.XmpReader.<init>(Unknown Source)
# >         at com.lowagie.text.pdf.PdfStamperImp.close(Unknown Source)
# >         at com.lowagie.text.pdf.PdfStamper.close(Unknown Source)
# >         at
# > AddPageNumbersToExistingPageNumberPDF.main(AddPageNumbersToExistingPa
# > geNumberPDF.java:30)
# >

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
  <fo:layout-master-set>
    <fo:simple-page-master master-name="A4-portrait" page-height="29.7cm" page-width="21.0cm" margin="2cm">
      <fo:region-body margin="2cm" />
    </fo:simple-page-master>
  </fo:layout-master-set>
  <!--
  <fo:declarations>
    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-702">
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
        <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/";>
          <dc:title>Document title</dc:title>
          <dc:creator>Document author</dc:creator>
          <dc:description>Document subject</dc:description>
        </rdf:Description>
        <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/";>
          <xmp:CreatorTool>Tool used to make the PDF</xmp:CreatorTool>
        </rdf:Description>
      </rdf:RDF>
    </x:xmpmeta>
    </fo:declarations>
  -->
  <fo:page-sequence master-reference="A4-portrait" initial-page-number="1" force-page-count="no-force">
    <fo:flow flow-name="xsl-region-body">
      <fo:block font-family="Arial Narrow">příliš žluťoučký kůň úpěl ďábelské ódy</fo:block>
      <fo:block font-family="Arial Narrow">PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY</fo:block>
    </fo:flow>
  </fo:page-sequence>
</fo:root>

source.pdf
Description: Adobe PDF document

import java.io.FileOutputStream;

import com.lowagie.text.Element;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;

public class AddPageNumbersToExistingPageNumberPDF {
  public static void main(String[] args) {
    try {
      PdfReader reader = new PdfReader("source.pdf");
      int n = reader.getNumberOfPages();
      System.out.println("Total number of pages is \"" + n + "\".");
      /*
      //this throws Exception with message 
      PdfStamper stamp = new PdfStamper(reader, new FileOutputStream("result.pdf"));
      stamp.close();
      */
    } catch (Exception de) {
      de.printStackTrace();
    }
  }
}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

0.95: Metadata is not valid XML document even if there are no metadata defined in FO source

Reply via email to