[iText-questions] Page resize with iText
Hi, I've am trying to resize pages so I can put header and footer. However, when I use PdfWriter and GetImportedPage and AddTemplate, I lose all annotations and interactive content. If I use PdfCopy I don't see a way to specify new page size (new Document(size) has no effect). So, my question is: Can you help me resolve this. Thank you Valentin -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
[iText-questions] PDFReader getPageContent() method returning weird escape codes
So I have a PDF that I read the contents. I didnt make this PDF but I I get in the text the following two escape characters: \222 and \036 \222 seems to be the single quote (') and \036 seems to be something with the letter (f) These codes appear in several places however the Acrobat Reader displays it correctly. Here is some partial examples. *Example: 1 *... /Span /MCID 947 BDC /T1_1 1 Tf ( )Tj EMC /Span /MCID 948 BDC -13.716 -1.6 Td [(complex of)-14(fers eight scaled-down ball * \036elds* replicated from famous )]TJ EMC /Span /MCID 949 BDC T* .. *Example 2:* .. /Span /MCID 950 BDC T* [(Y)110(ankee Stadium. And if * you\222re* interested in *\036nding* ice in the middl\ .. The above line is *Yankee Statium, And if you're interested in finding ice in the middl* I thought these are supposed to be Ascii Octal codes but they don't match ASCII. Is there a different way of decoding them? Here is the coee I use to read. *PdfReader reader = new PdfReader(filein); byte[] streamBytes = reader.getPageContent(1); StringBuffer buf = new StringBuffer(); String contentStream = new String(streamBytes); * Any idea what this is? Do I need to post the whole PDF? Thanks -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] Page resize with iText
Hello, you have posted a mail to itext-questions@lists.sourceforge.net but you weren't subscribed. You are receiving this answer because I've added your mail address in Bcc: I will do this only once! Further answers will be sent to the mailing-list only (you won't receive them if you don't subscribe). Further questions to the mailing-list will be rejected unless you subscribe. Further questions sent to the 1t3xt address will be ignored. Please understand that, as long as you don't subscribe, somebody has to MANUALLY approve your mail among a huge load of SPAM. You can help us avoid this boring administrative job by following the rules: http://itextpdf.com/support.php Valentin Boiadjiev wrote: Hi, I've am trying to resize pages so I can put header and footer. However, when I use PdfWriter and GetImportedPage and AddTemplate, I lose all annotations and interactive content. If I use PdfCopy I don't see a way to specify new page size (new Document(size) has no effect). So, my question is: Can you help me resolve this. Your problems are explained in chapter 6 of iText in Action - Second Edition. That chapter can be downloaded for free if you go to the following page: http://affiliate.manning.com/idevaffiliate.php?id=223_212 (See the right column with title Downloads.) If you want to use PdfWriter + PdfImportedPage, you need to copy all annotations separately, and scale all the dimensions. This isn't impossible, but it's plenty of work. I would advise against it. If you want to add a header and a footer, why don't you just change the MediaBox (and CropBox if any)? That way, you don't have to scale the content, you just provide more space. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] PDFReader getPageContent() method returning weird escape codes
Wyatt Biker wrote: So I have a PDF that I read the contents. I didnt make this PDF but I I get in the text the following two escape characters: \222 and \036 Those are indeed octals. \222 seems to be the single quote (') and \036 seems to be something with the letter (f) That's possible, although the actual glyphs depends on the encoding. These codes appear in several places however the Acrobat Reader displays it correctly. Here is some partial examples. OK, so there's no problem. I thought these are supposed to be Ascii Octal codes but they don't match ASCII. Is there a different way of decoding them? In your code snippet, I see: /T1_1 1 Tf /T1_1 is a reference to a font dictionary. You can find the object number of that font in the /Resources of the /Page dictionary. If you look at the font dictionary, you'll find the encoding that is needed, for example MacRomanEncoding, MacExpertEncoding, WinAnsiEncoding,... Here is the code I use to read. PdfReader reader = new PdfReader(filein); byte[] streamBytes = reader.getPageContent(1); StringBuffer buf = new StringBuffer(); String contentStream = new String(streamBytes); Are you going to parse the PDF syntax yourself? If so, how come you don't know about font dictionaries? Did you try the com.itextpdf.text.pdf.parser classes? If so, did they generate the correct output? -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] Question on inline image parse exception
Op 12/02/2011 1:39, Bharathi Kongara schreef: Hi guys, I'm fairly new to iText and have looked for options to get around the following error but couldn't find much help. All I'm trying to do is to just extract text from the first page of a PDF (a valid one). I tried to use both the PdfContentStreamProcessor.processContent method and the PdfTextExtractor./getTextFromPage /method but looks like the latter one is using the first one underneath anyway. Would appreciate any help! ExceptionConverter: _com.itextpdf.text.pdf.parser.InlineImageUtils$InlineImageParseException_: EI not found after end of image data Normally, images are added to a page using an external object: an Image XObject. This reduces the file size: the bytes of an image that is used on different pages are added to the file only once. In the case of inline images, the bytes are added in the content stream of a page. So if an image is added the content stream of two different pages, its bytes are in the PDF twice. You can recognize inline images in the content stream because they are between two operators: BI (= begin image) and EI (end image). The exception is telling you that it found a BI, but not an EI. There could be several reasons for this: parsing inline images is error prone. Maybe you aren't using the latest iText with the fixes to catch some of these errors. Maybe you've found a PDF revealing a bug in iText. You're not telling us which iText version you're using, nor are you providing us with the PDF, so we can't help you any further. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] suggestion about Paragraph and Phrase
Op 11/02/2011 18:05, Michael Niedermair schreef: Hi Bruno, I have a suggestion about Paragraph and Phrase. It were nice, if a new Constructor exists, which set the init size of the ArrayList. This is a suggestion that should be posted to the mailing list. Personally, I don't see the value of such an extra constructor. When I use Phrase or Paragraph, there's absolutely no way to know the size of the List in advance. If you have a controlled environment where you do, you could extend the classes and use that specific subclass. e.g. public Paragraph(int initialCapacity) { super(initialCapacity)); } public Phrase(int initialCapacity) { super(initialCapacity)); } // change public Phrase() { this(16.0f); } The problem is, that the default init size is 10. Each time, I arrive the limit, the size is increasing. newCapacity = (oldCapacity * 3)/2 + 1 elementData = Arrays.copyOf(elementData, newCapacity); 10 .. copy array, 16 .. copy array 25 .. copy array 38 .. copy array 58 .. copy array an so on. If I add a lot of text (e.g. from a file, line by line) the arraylist copy the array and copy the array and so on. The init size set by the user can solve the problem with array copy. By Michael -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] Question on inline image parse exception
First, try out the very latest code from SVN and see if it fixes your problem. I added code a week ago to work around improperly implemented inline images in files generated by a large financial institution. If you still have problems after that, I'd suggest that you open a ticket and attach a *small* PDF that demonstrates the problem (i.e. a single page PDF). -- View this message in context: http://itext-general.2136553.n4.nabble.com/Question-on-inline-image-parse-exception-tp3302271p3302728.html Sent from the iText - General mailing list archive at Nabble.com. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] PDFReader getPageContent() method returning weird escape codes
I am planning to do something conceptually simple: Find the ? and ? enclosing tags in a user designed pdf and replace it with input from a database while maintaining kerning and tracking intact. Whatever is inside ? ? will be of a single font and single tracking. Can this be done easily with the parser classes? I ordered the book. I hope it has some good examples. Your help is appreciated. On Sat, Feb 12, 2011 at 4:25 AM, 1T3XT BVBA i...@1t3xt.info wrote: Wyatt Biker wrote: So I have a PDF that I read the contents. I didnt make this PDF but I I get in the text the following two escape characters: \222 and \036 Those are indeed octals. \222 seems to be the single quote (') and \036 seems to be something with the letter (f) That's possible, although the actual glyphs depends on the encoding. These codes appear in several places however the Acrobat Reader displays it correctly. Here is some partial examples. OK, so there's no problem. I thought these are supposed to be Ascii Octal codes but they don't match ASCII. Is there a different way of decoding them? In your code snippet, I see: /T1_1 1 Tf /T1_1 is a reference to a font dictionary. You can find the object number of that font in the /Resources of the /Page dictionary. If you look at the font dictionary, you'll find the encoding that is needed, for example MacRomanEncoding, MacExpertEncoding, WinAnsiEncoding,... Here is the code I use to read. PdfReader reader = new PdfReader(filein); byte[] streamBytes = reader.getPageContent(1); StringBuffer buf = new StringBuffer(); String contentStream = new String(streamBytes); Are you going to parse the PDF syntax yourself? If so, how come you don't know about font dictionaries? Did you try the com.itextpdf.text.pdf.parser classes? If so, did they generate the correct output? -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] PDFReader getPageContent() method returning weird escape codes
Wyatt, Wyatt Biker wrote: I am planning to do something conceptually simple: Find the ? and ? enclosing tags in a user designed pdf and replace it with input from a database while maintaining kerning and tracking intact. Whatever is inside ? ? will be of a single font and single tracking. Considering that pdf essentially is a format that describes where individual or small groups of glyphs shall appear on screen or on paper, I don't consider that simple. If your replacements from your db aren't guaranteed to have the same length as your placeholders, you're out of luck. If they are, a generic solution is merely difficult. That you seem unaware of ligatures, doesn't really help. Regards, Michael. -- View this message in context: http://itext-general.2136553.n4.nabble.com/PDFReader-getPageContent-method-returning-weird-escape-codes-tp3302481p3303248.html Sent from the iText - General mailing list archive at Nabble.com. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] Change producer metadata
1T3XT BVBA info at 1t3xt.info writes: Op 10/02/2011 23:40, qplace schreef: I am trying to change producer information in existing pdf using the example ... What is the right way to change producer info? I have commercial iText license. Please use the mail address you've obtained when buying the commercial license. I am using ad...@tradeplatform.us to post here and it is also email address used in communications with Mr.Bradbury. I received all license-related confirmations on that address. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php