Added: websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithattachments.html ============================================================================== --- websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithattachments.html (added) +++ websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithattachments.html Mon Jan 5 20:30:08 2015 @@ -0,0 +1,176 @@ +<!DOCTYPE html> +<html lang="en"> + +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE- 2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +<head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + + <title>Apache PDFBox | Cookbook - Working with Attachments</title> + + <link href="/bootstrap/css/bootstrap.min.css" rel="stylesheet"> + <link href="/FontAwesome/css/font-awesome.css" rel="stylesheet"> + <link href="/Iconic/iconic fill/iconic_fill.css" rel="stylesheet"> + <link href="/css/pygments-github.css" rel="stylesheet"> + + <link href="/css/site.css" rel="stylesheet"> + + + + + + + <!-- Twitter Bootstrap and jQuery after this line. --> + <script src="//code.jquery.com/jquery-latest.js"></script> + <script src="/bootstrap/js/bootstrap.min.js"></script> +</head> + +<body> + <nav class="navbar navbar-default navbar-top"> + <div class="container"> + <div class="navbar-header"> + <a href="/index.html"> + <img class="logo" src="/images/logo-head.gif"> + </a> + </div> + </div> + </nav> + + <div class="container"> + + <div class="row"> + <div class="col-xs-3"> + + <ul class="sidebar"> + <li class="sidebar-header">Apache PDFBox</li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + + <li class="sidebar-header">Community</li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + + <li class="sidebar-header">Documentation</li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> + </ul> + </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> + </ul> + </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + + <li class="sidebar-header">Apache Software Foundation</li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-xs-9"> + <h1 id="working-with-attachments">Working with Attachments</h1> +<h2 id="the-pdf-file-specification">The PDF File Specification</h2> +<p>See package:org.apache.pdfbox.pdmodel.common.filespecification<br /> +See example:EmbeddedFiles </p> +<p>A PDF can contain references to external files via the file system or a URL to a remote +location. It is also possible to embed a binary file into a PDF document.</p> +<p>There are two classes that can be used when referencing a file. PDSimpleFileSpecification +is a simple string reference to a file(e.g. "./movies/BigMovie.avi"). The simple file +specification does not allow for any parameters to be set. </p> +<p>The PDComplexFileSpecification is more feature rich and allows for advanced settings on +the file reference.</p> +<p>It is also possible to embed a file directly into a PDF. Instead of setting the file +attribute of the PDComplexFileSpecification, the EmbeddedFile attribute can be used instead.</p> +<h2 id="adding-a-file-attachment">Adding a File Attachment</h2> +<p>PDF documents can contain file attachments that are accessed from the Document->File Attachments +menu. PDFBox allows attachments to be added to and extracted from PDF documents. +Attachments are part of the named tree that is attached to the document catalog.</p> +<div class="codehilite"><pre><span class="n">PDEmbeddedFilesNameTreeNode</span> <span class="n">efTree</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDEmbeddedFilesNameTreeNode</span><span class="o">();</span> + +<span class="c1">//first create the file specification, which holds the embedded file</span> +<span class="n">PDComplexFileSpecification</span> <span class="n">fs</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDComplexFileSpecification</span><span class="o">();</span> +<span class="n">fs</span><span class="o">.</span><span class="na">setFile</span><span class="o">(</span> <span class="s">"Test.txt"</span> <span class="o">);</span> +<span class="n">InputStream</span> <span class="n">is</span> <span class="o">=</span> <span class="o">...;</span> +<span class="n">PDEmbeddedFile</span> <span class="n">ef</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDEmbeddedFile</span><span class="o">(</span><span class="n">doc</span><span class="o">,</span> <span class="n">is</span> <span class="o">);</span> +<span class="c1">//set some of the attributes of the embedded file</span> +<span class="n">ef</span><span class="o">.</span><span class="na">setSubtype</span><span class="o">(</span> <span class="s">"test/plain"</span> <span class="o">);</span> +<span class="n">ef</span><span class="o">.</span><span class="na">setSize</span><span class="o">(</span> <span class="n">data</span><span class="o">.</span><span class="na">length</span> <span class="o">);</span> +<span class="n">ef</span><span class="o">.</span><span class="na">setCreationDate</span><span class="o">(</span> <span class="k">new</span> <span class="n">GregorianCalendar</span><span class="o">()</span> <span class="o">);</span> +<span class="n">fs</span><span class="o">.</span><span class="na">setEmbeddedFile</span><span class="o">(</span> <span class="n">ef</span> <span class="o">);</span> + +<span class="c1">//now add the entry to the embedded file tree and set in the document.</span> +<span class="n">Map</span> <span class="n">efMap</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">();</span> +<span class="n">efMap</span><span class="o">.</span><span class="na">put</span><span class="o">(</span> <span class="s">"My first attachment"</span><span class="o">,</span> <span class="n">fs</span> <span class="o">);</span> +<span class="n">efTree</span><span class="o">.</span><span class="na">setNames</span><span class="o">(</span> <span class="n">efMap</span> <span class="o">);</span> +<span class="c1">//attachments are stored as part of the "names" dictionary in the document catalog</span> +<span class="n">PDDocumentNameDictionary</span> <span class="n">names</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDDocumentNameDictionary</span><span class="o">(</span> <span class="n">doc</span><span class="o">.</span><span class="na">getDocumentCatalog</span><span class="o">()</span> <span class="o">);</span> +<span class="n">names</span><span class="o">.</span><span class="na">setEmbeddedFiles</span><span class="o">(</span> <span class="n">efTree</span> <span class="o">);</span> +<span class="n">doc</span><span class="o">.</span><span class="na">getDocumentCatalog</span><span class="o">().</span><span class="na">setNames</span><span class="o">(</span> <span class="n">names</span> <span class="o">);</span> +</pre></div> + </div> + </div> + </div> + + <footer class="footer"> + <div class="container" + <div class="row"> + <div class="span3"> + <!-- nothing in here on purpose --> + </div> + <div class="span9"> + <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + </footer> + +</body> + +</html>
Added: websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithfonts.html ============================================================================== --- websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithfonts.html (added) +++ websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithfonts.html Mon Jan 5 20:30:08 2015 @@ -0,0 +1,297 @@ +<!DOCTYPE html> +<html lang="en"> + +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE- 2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +<head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + + <title>Apache PDFBox | Cookbook - Working with Fonts</title> + + <link href="/bootstrap/css/bootstrap.min.css" rel="stylesheet"> + <link href="/FontAwesome/css/font-awesome.css" rel="stylesheet"> + <link href="/Iconic/iconic fill/iconic_fill.css" rel="stylesheet"> + <link href="/css/pygments-github.css" rel="stylesheet"> + + <link href="/css/site.css" rel="stylesheet"> + + + + + + + <!-- Twitter Bootstrap and jQuery after this line. --> + <script src="//code.jquery.com/jquery-latest.js"></script> + <script src="/bootstrap/js/bootstrap.min.js"></script> +</head> + +<body> + <nav class="navbar navbar-default navbar-top"> + <div class="container"> + <div class="navbar-header"> + <a href="/index.html"> + <img class="logo" src="/images/logo-head.gif"> + </a> + </div> + </div> + </nav> + + <div class="container"> + + <div class="row"> + <div class="col-xs-3"> + + <ul class="sidebar"> + <li class="sidebar-header">Apache PDFBox</li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + + <li class="sidebar-header">Community</li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + + <li class="sidebar-header">Documentation</li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> + </ul> + </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> + </ul> + </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + + <li class="sidebar-header">Apache Software Foundation</li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-xs-9"> + <h1 id="working-with-fonts">Working with Fonts</h1> +<h2 id="standard-14-fonts">Standard 14 Fonts</h2> +<p>The PDF specification states that a standard set of 14 fonts will always be available when consuming PDF documents. In PDFBox these are defined as constants in the PDType1Font class.</p> +<table> +<thead> +<tr> +<th>Standard Font</th> +<th>Description</th> +</tr> +</thead> +<tbody> +<tr> +<td>PDType1Font.TIMES_ROMAN</td> +<td>Times regular</td> +</tr> +<tr> +<td>PDType1Font.TIMES_BOLD</td> +<td>Times bold</td> +</tr> +<tr> +<td>PDType1Font.TIMES_ITALIC</td> +<td>Times italic</td> +</tr> +<tr> +<td>PDType1Font.TIMES_BOLD_ITALIC</td> +<td>Times bold italic</td> +</tr> +<tr> +<td>PDType1Font.HELVETICA</td> +<td>Helvetica regular</td> +</tr> +<tr> +<td>PDType1Font.HELVETICA_BOLD</td> +<td>Helvetica bold</td> +</tr> +<tr> +<td>PDType1Font.HELVETICA_OBLIQUE</td> +<td>Helvetica italic</td> +</tr> +<tr> +<td>PDType1Font.HELVETICA_BOLD_OBLIQUE</td> +<td>Helvetica bold italic</td> +</tr> +<tr> +<td>PDType1Font.COURIER</td> +<td>Courier</td> +</tr> +<tr> +<td>PDType1Font.COURIER_BOLD</td> +<td>Courier bold</td> +</tr> +<tr> +<td>PDType1Font.COURIER_OBLIQUE</td> +<td>Courier italic</td> +</tr> +<tr> +<td>PDType1Font.COURIER_BOLD_OBLIQUE</td> +<td>Courier bold italic</td> +</tr> +<tr> +<td>PDType1Font.SYMBOL</td> +<td>Symbol Set</td> +</tr> +<tr> +<td>PDType1Font.ZAPF_DINGBATS</td> +<td>Dingbat Typeface</td> +</tr> +</tbody> +</table> +<h2 id="hello-world-using-a-pdf-base-font">Hello World using a PDF base font</h2> +<p>This small sample shows how to create a new document and print the text "Hello World" using one of the PDF base fonts.</p> +<div class="codehilite"><pre><span class="c1">// Create a document and add a page to it</span> +<span class="n">PDDocument</span> <span class="n">document</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDDocument</span><span class="o">();</span> +<span class="n">PDPage</span> <span class="n">page</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPage</span><span class="o">();</span> +<span class="n">document</span><span class="o">.</span><span class="na">addPage</span><span class="o">(</span> <span class="n">page</span> <span class="o">);</span> + +<span class="c1">// Create a new font object selecting one of the PDF base fonts</span> +<span class="n">PDFont</span> <span class="n">font</span> <span class="o">=</span> <span class="n">PDType1Font</span><span class="o">.</span><span class="na">HELVETICA_BOLD</span><span class="o">;</span> + +<span class="c1">// Start a new content stream which will "hold" the to be created content</span> +<span class="n">PDPageContentStream</span> <span class="n">contentStream</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPageContentStream</span><span class="o">(</span><span class="n">document</span><span class="o">,</span> <span class="n">page</span><span class="o">);</span> + +<span class="c1">// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">beginText</span><span class="o">();</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">setFont</span><span class="o">(</span> <span class="n">font</span><span class="o">,</span> <span class="mi">12</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">moveTextPositionByAmount</span><span class="o">(</span> <span class="mi">100</span><span class="o">,</span> <span class="mi">700</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">drawString</span><span class="o">(</span> <span class="s">"Hello World"</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">endText</span><span class="o">();</span> + +<span class="c1">// Make sure that the content stream is closed:</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> + +<span class="c1">// Save the results and ensure that the document is properly closed:</span> +<span class="n">document</span><span class="o">.</span><span class="na">save</span><span class="o">(</span> <span class="s">"Hello World.pdf"</span><span class="o">);</span> +<span class="n">document</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> +</pre></div> + + +<h2 id="hello-world-using-a-truetype-font">Hello World using a TrueType font</h2> +<p>This small sample shows how to create a new document and print the text "Hello World" using a TrueType font.</p> +<div class="codehilite"><pre><span class="c1">// Create a document and add a page to it</span> +<span class="n">PDDocument</span> <span class="n">document</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDDocument</span><span class="o">();</span> +<span class="n">PDPage</span> <span class="n">page</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPage</span><span class="o">();</span> +<span class="n">document</span><span class="o">.</span><span class="na">addPage</span><span class="o">(</span> <span class="n">page</span> <span class="o">);</span> + +<span class="c1">// Create a new font object by loading a TrueType font into the document</span> +<span class="n">PDFont</span> <span class="n">font</span> <span class="o">=</span> <span class="n">PDTrueTypeFont</span><span class="o">.</span><span class="na">loadTTF</span><span class="o">(</span><span class="n">document</span><span class="o">,</span> <span class="s">"Arial.ttf"</span><span class="o">);</span> + +<span class="c1">// Start a new content stream which will "hold" the to be created content</span> +<span class="n">PDPageContentStream</span> <span class="n">contentStream</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPageContentStream</span><span class="o">(</span><span class="n">document</span><span class="o">,</span> <span class="n">page</span><span class="o">);</span> + +<span class="c1">// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">beginText</span><span class="o">();</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">setFont</span><span class="o">(</span> <span class="n">font</span><span class="o">,</span> <span class="mi">12</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">moveTextPositionByAmount</span><span class="o">(</span> <span class="mi">100</span><span class="o">,</span> <span class="mi">700</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">drawString</span><span class="o">(</span> <span class="s">"Hello World"</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">endText</span><span class="o">();</span> + +<span class="c1">// Make sure that the content stream is closed:</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> + +<span class="c1">// Save the results and ensure that the document is properly closed:</span> +<span class="n">document</span><span class="o">.</span><span class="na">save</span><span class="o">(</span> <span class="s">"Hello World.pdf"</span><span class="o">);</span> +<span class="n">document</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> +</pre></div> + + +<p>While it is recommended to embed all fonts for greatest portability not all PDF producer +applications will do this. When displaying a PDF it is necessary to find an external font to use. +PDFBox will look for a mapping file to use when substituting fonts.</p> +<p>PDFBox will load Resources/PDFBox_External_Fonts.properties off of the classpath to map font +names to TTF font files. The UNKNOWN_FONT property in that file will tell PDFBox which font to +use when no mapping exists. </p> +<h2 id="hello-world-using-a-postscript-type1-font">Hello World using a Postscript Type1 font</h2> +<p>This small sample shows how to create a new document and print the text "Hello World" using a Postscript Type1 font.</p> +<div class="codehilite"><pre><span class="c1">// Create a document and add a page to it</span> +<span class="n">PDDocument</span> <span class="n">document</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDDocument</span><span class="o">();</span> +<span class="n">PDPage</span> <span class="n">page</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPage</span><span class="o">();</span> +<span class="n">document</span><span class="o">.</span><span class="na">addPage</span><span class="o">(</span> <span class="n">page</span> <span class="o">);</span> + +<span class="c1">// Create a new font object by loading a Postscript Type 1 font into the document</span> +<span class="n">PDFont</span> <span class="n">font</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDType1AfmPfbFont</span><span class="o">(</span><span class="n">doc</span><span class="o">,</span><span class="s">"cfm.afm"</span><span class="o">);</span> + +<span class="c1">// Start a new content stream which will "hold" the to be created content</span> +<span class="n">PDPageContentStream</span> <span class="n">contentStream</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDPageContentStream</span><span class="o">(</span><span class="n">document</span><span class="o">,</span> <span class="n">page</span><span class="o">);</span> + +<span class="c1">// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">beginText</span><span class="o">();</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">setFont</span><span class="o">(</span> <span class="n">font</span><span class="o">,</span> <span class="mi">12</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">moveTextPositionByAmount</span><span class="o">(</span> <span class="mi">100</span><span class="o">,</span> <span class="mi">700</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">drawString</span><span class="o">(</span> <span class="s">"Hello World"</span> <span class="o">);</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">endText</span><span class="o">();</span> + +<span class="c1">// Make sure that the content stream is closed:</span> +<span class="n">contentStream</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> + +<span class="c1">// Save the results and ensure that the document is properly closed:</span> +<span class="n">document</span><span class="o">.</span><span class="na">save</span><span class="o">(</span> <span class="s">"Hello World.pdf"</span><span class="o">);</span> +<span class="n">document</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> +</pre></div> + </div> + </div> + </div> + + <footer class="footer"> + <div class="container" + <div class="row"> + <div class="span3"> + <!-- nothing in here on purpose --> + </div> + <div class="span9"> + <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + </footer> + +</body> + +</html> Added: websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithmetadata.html ============================================================================== --- websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithmetadata.html (added) +++ websites/staging/pdfbox/trunk/content/1.8/cookbook/workingwithmetadata.html Mon Jan 5 20:30:08 2015 @@ -0,0 +1,190 @@ +<!DOCTYPE html> +<html lang="en"> + +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE- 2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +<head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + + <title>Apache PDFBox | Cookbook - Working with Metadata</title> + + <link href="/bootstrap/css/bootstrap.min.css" rel="stylesheet"> + <link href="/FontAwesome/css/font-awesome.css" rel="stylesheet"> + <link href="/Iconic/iconic fill/iconic_fill.css" rel="stylesheet"> + <link href="/css/pygments-github.css" rel="stylesheet"> + + <link href="/css/site.css" rel="stylesheet"> + + + + + + + <!-- Twitter Bootstrap and jQuery after this line. --> + <script src="//code.jquery.com/jquery-latest.js"></script> + <script src="/bootstrap/js/bootstrap.min.js"></script> +</head> + +<body> + <nav class="navbar navbar-default navbar-top"> + <div class="container"> + <div class="navbar-header"> + <a href="/index.html"> + <img class="logo" src="/images/logo-head.gif"> + </a> + </div> + </div> + </nav> + + <div class="container"> + + <div class="row"> + <div class="col-xs-3"> + + <ul class="sidebar"> + <li class="sidebar-header">Apache PDFBox</li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + + <li class="sidebar-header">Community</li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + + <li class="sidebar-header">Documentation</li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> + </ul> + </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> + </ul> + </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + + <li class="sidebar-header">Apache Software Foundation</li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-xs-9"> + <h1 id="working-with-metadata">Working with Metadata</h1> +<h2 id="introduction">Introduction</h2> +<p>PDF documents can contain information describing the document itself or certain objects +within the document such as the author of the document or it's creation date. +Basic information can be set and retrieved using the PDDocumentInformation object.</p> +<p>In addition to that more metadata can be retrieved using the XML metadata as decribed below. +Getting basic Metadata</p> +<p>To set or retrieve basic information about the document the PDDocumentInformation object +provides a high level API to that information:</p> +<div class="codehilite"><pre><span class="n">PDDocumentInformation</span> <span class="n">info</span> <span class="o">=</span> <span class="n">document</span><span class="o">.</span><span class="na">getDocumentInformation</span><span class="o">();</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Page Count="</span> <span class="o">+</span> <span class="n">document</span><span class="o">.</span><span class="na">getNumberOfPages</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Title="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getTitle</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Author="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getAuthor</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Subject="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getSubject</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Keywords="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getKeywords</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Creator="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getCreator</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Producer="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getProducer</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Creation Date="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getCreationDate</span><span class="o">()</span> <span class="o">);</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Modification Date="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getModificationDate</span><span class="o">());</span> +<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span> <span class="s">"Trapped="</span> <span class="o">+</span> <span class="n">info</span><span class="o">.</span><span class="na">getTrapped</span><span class="o">()</span> <span class="o">);</span> +</pre></div> + + +<h2 id="accessing-pdf-metadata">Accessing PDF Metadata</h2> +<p>See class:org.apache.pdfbox.pdmodel.common.PDMetadata<br /> +See example:AddMetadataFromDocInfo<br /> +See Adobe Documentation:XMP Specification </p> +<p>PDF documents can have XML metadata associated with certain objects within a PDF document. +For example, the following PD Model objects have the ability to contain metadata:</p> +<div class="codehilite"><pre><span class="n">PDDocumentCatalog</span> +<span class="n">PDPage</span> +<span class="n">PDXObject</span> +<span class="n">PDICCBased</span> +<span class="n">PDStream</span> +</pre></div> + + +<p>The metadata that is stored in PDF objects conforms to the XMP specification, it is +recommended that you review that specification. Currently there is no high level API for +managing the XML metadata, PDFBox uses standard java InputStream/OutputStream to retrieve +or set the XML metadata.</p> +<div class="codehilite"><pre><span class="n">PDDocument</span> <span class="n">doc</span> <span class="o">=</span> <span class="n">PDDocument</span><span class="o">.</span><span class="na">load</span><span class="o">(</span> <span class="o">...</span> <span class="o">);</span> +<span class="n">PDDocumentCatalog</span> <span class="n">catalog</span> <span class="o">=</span> <span class="n">doc</span><span class="o">.</span><span class="na">getDocumentCatalog</span><span class="o">();</span> +<span class="n">PDMetadata</span> <span class="n">metadata</span> <span class="o">=</span> <span class="n">catalog</span><span class="o">.</span><span class="na">getMetadata</span><span class="o">();</span> + +<span class="c1">//to read the XML metadata</span> +<span class="n">InputStream</span> <span class="n">xmlInputStream</span> <span class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span class="na">createInputStream</span><span class="o">();</span> + +<span class="c1">//or to write new XML metadata</span> +<span class="n">InputStream</span> <span class="n">newXMPData</span> <span class="o">=</span> <span class="o">...;</span> +<span class="n">PDMetadata</span> <span class="n">newMetadata</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDMetadata</span><span class="o">(</span><span class="n">doc</span><span class="o">,</span> <span class="n">newXMLData</span><span class="o">,</span> <span class="kc">false</span> <span class="o">);</span> +<span class="n">catalog</span><span class="o">.</span><span class="na">setMetadata</span><span class="o">(</span> <span class="n">newMetadata</span> <span class="o">);</span> +</pre></div> + </div> + </div> + </div> + + <footer class="footer"> + <div class="container" + <div class="row"> + <div class="span3"> + <!-- nothing in here on purpose --> + </div> + <div class="span9"> + <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + </footer> + +</body> + +</html> Added: websites/staging/pdfbox/trunk/content/1.8/dependencies.html ============================================================================== --- websites/staging/pdfbox/trunk/content/1.8/dependencies.html (added) +++ websites/staging/pdfbox/trunk/content/1.8/dependencies.html Mon Jan 5 20:30:08 2015 @@ -0,0 +1,224 @@ +<!DOCTYPE html> +<html lang="en"> + +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE- 2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +<head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + + <title>Apache PDFBox | Dependencies</title> + + <link href="/bootstrap/css/bootstrap.min.css" rel="stylesheet"> + <link href="/FontAwesome/css/font-awesome.css" rel="stylesheet"> + <link href="/Iconic/iconic fill/iconic_fill.css" rel="stylesheet"> + <link href="/css/pygments-github.css" rel="stylesheet"> + + <link href="/css/site.css" rel="stylesheet"> + + + + + + + <!-- Twitter Bootstrap and jQuery after this line. --> + <script src="//code.jquery.com/jquery-latest.js"></script> + <script src="/bootstrap/js/bootstrap.min.js"></script> +</head> + +<body> + <nav class="navbar navbar-default navbar-top"> + <div class="container"> + <div class="navbar-header"> + <a href="/index.html"> + <img class="logo" src="/images/logo-head.gif"> + </a> + </div> + </div> + </nav> + + <div class="container"> + + <div class="row"> + <div class="col-xs-3"> + + <ul class="sidebar"> + <li class="sidebar-header">Apache PDFBox</li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + + <li class="sidebar-header">Community</li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + + <li class="sidebar-header">Documentation</li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> + </ul> + </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> + </ul> + </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + + <li class="sidebar-header">Apache Software Foundation</li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-xs-9"> + <h1 id="dependencies">Dependencies</h1> +<p>PDFBox consists of a three related components and depends on a few external libraries. This page describes what these libraries are and how to include them in your application.</p> +<p class="alert alert-info">This information is for the current stable 1.8.x branch. The 2.0 development branch will have different dependencies.</p> + +<h2 id="core-components">Core components</h2> +<p class="alert alert-info">These components are needed during runtime, development and testing dependent on the details below.</p> + +<p>The three PDFBox components are named <code>pdfbox</code>, <code>fontbox</code> and <code>jempbox</code>. The Maven groupId of all PDFBox components is org.apache.pdfbox.</p> +<h3 id="minimum-requirement">Minimum Requirement</h3> +<p>The main PDFBox component, pdfbox, has a hard dependency on the <a href="http://commons.apache.org/logging/">commons-logging</a> library. +Commons Logging is a generic wrapper around different logging frameworks, so you'll either need to also use a logging library like <a href="http://logging.apache.org/log4j/">log4j</a> +or let commons-logging fall back to the standard <a href="http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html">java.util.logging API</a> +included in the Java platform.</p> +<h3 id="font-handling">Font Handling</h3> +<p>For font handling the fontbox component is needed.</p> +<h3 id="xmp-metadata">XMP Metadata</h3> +<p>To support XMP metadata the jembox component is needed.</p> +<p>To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main +pdfbox library directly and the other required jars as transitive dependencies.</p> +<div class="codehilite"><pre><span class="nt"><dependency></span> + <span class="nt"><groupId></span>org.apache.pdfbox<span class="nt"></groupId></span> + <span class="nt"><artifactId></span>pdfbox<span class="nt"></artifactId></span> + <span class="nt"><version></span>...<span class="nt"></version></span> +<span class="nt"></dependency></span> +</pre></div> + + +<p>Set the version field to the latest stable PDFBox version.</p> +<h2 id="optional-dependencies">Optional dependencies</h2> +<p>Some features in PDFBox depend on optional external libraries. You can enable these features simply by including the required libraries in the classpath of your application.</p> +<h3 id="extented-image-format-support">Extented Image Format Support</h3> +<p>To support JBIG2 and writing TIF images additional libraries are needed. </p> +<p class="alert alert-warning">The image plugins described below are not part of the PDFBox distribution because of incompatible licensing terms. Please make sure to check if the licensing terms are compatible to your usage.</p> + +<p>For <strong>JBIG2</strong> support a Java ImageIO Plugin such as the <a href="https://github.com/levigo/jbig2-imageio">Levigo Plugin</a> or <a href="https://github.com/Borisvl/JBIG2-Image-Decoder">JBIG2-Image-Decoder +</a> will be needed. </p> +<p>To write <strong>TIFF</strong> images a JAI ImageIO Core library will be needed. </p> +<h4 id="pdf-encryption-and-signing">PDF Encryption and Signing</h4> +<p>The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the +<a href="http://www.bouncycastle.org/">Legion of the Bouncy Castle</a>. Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.</p> +<div class="codehilite"><pre><span class="nt"><dependency></span> + <span class="nt"><groupId></span>org.bouncycastle<span class="nt"></groupId></span> + <span class="nt"><artifactId></span>bcprov-jdk15<span class="nt"></artifactId></span> + <span class="nt"><version></span>1.44<span class="nt"></version></span> +<span class="nt"></dependency></span> +<span class="nt"><dependency></span> + <span class="nt"><groupId></span>org.bouncycastle<span class="nt"></groupId></span> + <span class="nt"><artifactId></span>bcmail-jdk15<span class="nt"></artifactId></span> + <span class="nt"><version></span>1.44<span class="nt"></version></span> +<span class="nt"></dependency></span> +</pre></div> + + +<p><br/></p> +<p class="alert alert-warning">New for PDFBox 2.0.0 (this version is still in development).</p> + +<p>Since PDFBOX-2460, building PDFBox now requires a JDK with "unlimited strength" cryptography. Which requires extra files to be installed.</p> +<p>For JDK 7: <a href="http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html">Java Cryptography Extension (JCE)</a></p> +<p>Users without these files will see the message:</p> +<div class="codehilite"><pre><span class="n">Failed</span> <span class="n">tests</span><span class="p">:</span> +<span class="n">TestPublicKeyEncryption</span><span class="p">.</span><span class="n">setUp</span><span class="p">:</span>70 <span class="n">JCE</span> <span class="n">unlimited</span> <span class="n">strength</span> <span class="n">jurisdiction</span> <span class="n">policy</span> <span class="n">files</span> <span class="n">are</span> <span class="n">not</span> <span class="n">installed</span> +<span class="n">TestPublicKeyEncryption</span><span class="p">.</span><span class="n">setUp</span><span class="p">:</span>70 <span class="n">JCE</span> <span class="n">unlimited</span> <span class="n">strength</span> <span class="n">jurisdiction</span> <span class="n">policy</span> <span class="n">files</span> <span class="n">are</span> <span class="n">not</span> <span class="n">installed</span> +<span class="n">TestPublicKeyEncryption</span><span class="p">.</span><span class="n">setUp</span><span class="p">:</span>70 <span class="n">JCE</span> <span class="n">unlimited</span> <span class="n">strength</span> <span class="n">jurisdiction</span> <span class="n">policy</span> <span class="n">files</span> <span class="n">are</span> <span class="n">not</span> <span class="n">installed</span> +<span class="n">TestSymmetricKeyEncryption</span><span class="p">.</span><span class="n">setUp</span><span class="p">:</span>80 <span class="n">JCE</span> <span class="n">unlimited</span> <span class="n">strength</span> <span class="n">jurisdiction</span> <span class="n">policy</span> <span class="n">files</span> <span class="n">are</span> <span class="n">not</span> <span class="n">installed</span> +</pre></div> + + +<h4 id="support-for-bidirectional-languages">Support for bidirectional languages</h4> +<p>Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the +<a href="http://site.icu-project.org/">International Components for Unicode</a> (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, +use the following Maven dependency.</p> +<div class="codehilite"><pre><span class="nt"><dependency></span> + <span class="nt"><groupId></span>com.ibm.icu<span class="nt"></groupId></span> + <span class="nt"><artifactId></span>icu4j<span class="nt"></artifactId></span> + <span class="nt"><version></span>3.8<span class="nt"></version></span> +<span class="nt"></dependency></span> +</pre></div> + + +<p>PDFBox also contains extra support for use with the <a href="http://lucene.apache.org/">Lucene</a> and <a href="http://ant.apache.org/">Ant</a> projects. Since in these cases PDFBox is just an +add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.</p> +<h2 id="dependencies-for-ant-builds">Dependencies for Ant builds</h2> +<p>The above instructions expect that you're using <a href="http://maven.apache.org/">Maven</a> or another build tool like <a href="http://ant.apache.org/ivy/">Ivy</a> that supports Maven dependencies. +If you instead use tools like <a href="http://ant.apache.org/">Ant</a> where you need to explicitly include all the required library jars in your application, you'll need to do +something different.</p> +<p>The easiest approach is to run <code>mvn dependency:copy-dependencies</code> inside the pdfbox directory of the latest PDFBox source release. This will copy all the required and optional +libraries discussed above into the pdfbox/target/dependencies directory. You can then simply copy all the libraries you need from this directory to your application.</p> + </div> + </div> + </div> + + <footer class="footer"> + <div class="container" + <div class="row"> + <div class="span3"> + <!-- nothing in here on purpose --> + </div> + <div class="span9"> + <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + </footer> + +</body> + +</html> Added: websites/staging/pdfbox/trunk/content/1.8/faq.html ============================================================================== --- websites/staging/pdfbox/trunk/content/1.8/faq.html (added) +++ websites/staging/pdfbox/trunk/content/1.8/faq.html Mon Jan 5 20:30:08 2015 @@ -0,0 +1,242 @@ +<!DOCTYPE html> +<html lang="en"> + +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE- 2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + +<head> + <meta charset="utf-8"> + <meta http-equiv="X-UA-Compatible" content="IE=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + + <title>Apache PDFBox | Frequently Asked Questions (FAQ)</title> + + <link href="/bootstrap/css/bootstrap.min.css" rel="stylesheet"> + <link href="/FontAwesome/css/font-awesome.css" rel="stylesheet"> + <link href="/Iconic/iconic fill/iconic_fill.css" rel="stylesheet"> + <link href="/css/pygments-github.css" rel="stylesheet"> + + <link href="/css/site.css" rel="stylesheet"> + + + + + + + <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> + + <!-- Twitter Bootstrap and jQuery after this line. --> + <script src="//code.jquery.com/jquery-latest.js"></script> + <script src="/bootstrap/js/bootstrap.min.js"></script> +</head> + +<body> + <nav class="navbar navbar-default navbar-top"> + <div class="container"> + <div class="navbar-header"> + <a href="/index.html"> + <img class="logo" src="/images/logo-head.gif"> + </a> + </div> + </div> + </nav> + + <div class="container"> + + <div class="row"> + <div class="col-xs-3"> + + <ul class="sidebar"> + <li class="sidebar-header">Apache PDFBox</li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + + <li class="sidebar-header">Community</li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + + <li class="sidebar-header">Documentation</li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> + </ul> + </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> + </ul> + </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + + <li class="sidebar-header">Apache Software Foundation</li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-xs-9"> + <h1 id="frequently-asked-questions">Frequently asked questions</h1> +<h2 id="general-questions">General Questions</h2> +<ul> +<li><a href="#releaseplan">When will the next version of PDFBox be released?</a></li> +<li><a href="#log4j">I am getting the below Log4J warning message, how do I remove it?</a></li> +<li><a href="#threadsafe">Is PDFBox thread safe?</a></li> +<li><a href="#notclosed">Why do I get a "Warning: You did not close the PDF Document"?</a></li> +</ul> +<h2 id="text-extraction">Text Extraction</h2> +<ul> +<li><a href="#notext">How come I am not getting any text from the PDF document?</a></li> +<li><a href="#gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting text?</a></li> +<li><a href="#fontwidth">What does "java.io.IOException: Can't handle font width" mean?</a></li> +<li><a href="#permission">Why do I get "You do not have permission to extract text" on some documents?</a></li> +<li><a href="#partially">Can't we just extract the text without parsing the whole document or extract text as it is parsed?</a></li> +</ul> +<h2 id="answers">Answers</h2> +<h3 id="general-questions_1">General Questions</h3> +<h4 id="releaseplan">When will the next version of PDFBox be released?</h4> +<p>As fixes are made and integrated into the repository these changes are documented in the +<a href="http://pdfbox.apache.org/downloads.html">release notes</a>. An estimate will be given of when the next version will be released. +Of course, this is only an estimate and could change.</p> +<h4 id="log4j">I am getting the below Log4J warning message, how do I remove it?</h4> +<div class="codehilite"><pre><span class="nl">log4j:</span><span class="n">WARN</span> <span class="n">No</span> <span class="n">appenders</span> <span class="n">could</span> <span class="n">be</span> <span class="n">found</span> <span class="k">for</span> <span class="n">logger</span> <span class="o">(</span><span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">pdfbox</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">ResourceLoader</span><span class="o">).</span> +<span class="nl">log4j:</span><span class="n">WARN</span> <span class="n">Please</span> <span class="n">initialize</span> <span class="n">the</span> <span class="n">log4j</span> <span class="n">system</span> <span class="n">properly</span><span class="o">.</span> +</pre></div> + + +<p>This message means that you need to configure the log4j logging system. +See the <a href="http://logging.apache.org/log4j/docs/documentation.html">log4j documentation</a> for more information.</p> +<p>PDFBox comes with a sample log4j configuration file. To use it you set a system property like this</p> +<div class="codehilite"><pre> <span class="n">java</span> <span class="o">-</span><span class="n">Dlog4j</span><span class="o">.</span><span class="na">configuration</span><span class="o">=</span><span class="n">log4j</span><span class="o">.</span><span class="na">xml</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">pdfbox</span><span class="o">.</span><span class="na">ExtractText</span> <span class="o"><</span><span class="n">PDF</span><span class="o">-</span><span class="n">file</span><span class="o">></span> <span class="o"><</span><span class="n">output</span><span class="o">-</span><span class="n">text</span><span class="o">-</span><span class="n">file</span><span class="o">></span> +</pre></div> + + +<p>If this is not working for you then you may have to specify the log4j config file using a URL path, like this:</p> +<div class="codehilite"><pre> <span class="p">:::</span><span class="n">java</span> + <span class="n">log4j</span><span class="p">.</span><span class="n">configuration</span><span class="p">=</span><span class="n">file</span><span class="p">:</span><span class="o">///<</span><span class="n">path</span> <span class="n">to</span> <span class="n">config</span> <span class="n">file</span><span class="o">></span> +</pre></div> + + +<p>Please see <a href="https://sourceforge.net/forum/forum.php?thread_id=1254229&amp;forum_id=267205">this</a> forum thread +for more information.</p> +<h4 id="threadsafe">Is PDFBox thread safe?</h4> +<p>No! Only one thread may access a single document at a time. You can have multiple threads +each accessing their own PDDocument object.</p> +<h4 id="notclosed">Why do I get a "Warning: You did not close the PDF Document"?</h4> +<p>You need to call close() on the PDDocument inside the finally block, if you +don't then the document will not be closed properly. Also, you must close all +PDDocument objects that get created. The following code creates <strong>two</strong> +PDDocument objects; one from the "new PDDocument()" and the second by the load method.</p> +<div class="codehilite"><pre> <span class="p">:::</span><span class="n">java</span> +<span class="n">PDDocument</span> <span class="n">doc</span> <span class="p">=</span> <span class="n">new</span> <span class="n">PDDocument</span><span class="p">();</span> +<span class="k">try</span> +<span class="p">{</span> + <span class="n">doc</span> <span class="p">=</span> <span class="n">PDDocument</span><span class="p">.</span><span class="n">load</span><span class="p">(</span> "<span class="n">my</span><span class="p">.</span><span class="n">pdf</span>" <span class="p">);</span> +<span class="p">}</span> +<span class="n">finally</span> +<span class="p">{</span> + <span class="k">if</span><span class="p">(</span> <span class="n">doc</span> !<span class="p">=</span> <span class="n">null</span> <span class="p">)</span> + <span class="p">{</span> + <span class="n">doc</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> + <span class="p">}</span> +<span class="p">}</span> +</pre></div> + + +<h3 id="text-extraction_1">Text Extraction</h3> +<h4 id="notext">How come I am not getting any text from the PDF document?</h4> +<p>Text extraction from a pdf document is a complicated task and there are many factors +involved that effect the possibility and accuracy of text extraction. It would be helpful +to the PDFBox team if you could try a couple things.</p> +<ul> +<li>Open the PDF in Acrobat and try to extract text from there. If Acrobat can extract text then PDFBox +should be able to as well and it is a bug if it cannot. If Acrobat cannot extract text then PDFBox 'probably' cannot either.</li> +<li>It might really be an image instead of text. Some PDF documents are just images that have been scanned in. +You can tell by using the selection tool in Acrobat, if you can't select any text then it is probably an image.</li> +</ul> +<h4 id="gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting text?</h4> +<p>This is because the characters in a PDF document can use a custom encoding +instead of unicode or ASCII. When you see gibberish text then it +probably means that a meaningless internal encoding is being used. The +only way to access the text is to use OCR. This may be a future +enhancement.</p> +<h4 id="fontwidth">What does "java.io.IOException: Can't handle font width" mean?</h4> +<p>This probably means that the "Resources" directory is not in your classpath. The +Resources directory is included in the PDFBox jar so this is only a problem if you +are building PDFBox yourself and not using the binary.</p> +<h4 id="permission">Why do I get "You do not have permission to extract text" on some documents?</h4> +<p>PDF documents have certain security permissions that can be applied to them and two +passwords associated with them, a user password and a master password. If the "cannot extract text" +permission bit is set then you need to decrypt the document with the master password in order +to extract the text.</p> +<h4 id="partially">Can't we just extract the text without parsing the whole document or extract text as it is parsed?</h4> +<p>Not really, for a couple reasons.</p> +<ul> +<li>If the document is encrypted then you need to parse at least until the encryption dictionary before +you can decrypt.</li> +<li>Sometimes the PDFont contains vital information needed for text extraction.</li> +<li>Text on a page does not have to be drawn in reading order. For example: if the page said "Hello World", +the pdf could have been written such that "World" gets drawn and then the cursor moves to the left and +the word "Hello" is drawn.</li> +</ul> + </div> + </div> + </div> + + <footer class="footer"> + <div class="container" + <div class="row"> + <div class="span3"> + <!-- nothing in here on purpose --> + </div> + <div class="span9"> + <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + </footer> + +</body> + +</html> Modified: websites/staging/pdfbox/trunk/content/building.html ============================================================================== --- websites/staging/pdfbox/trunk/content/building.html (original) +++ websites/staging/pdfbox/trunk/content/building.html Mon Jan 5 20:30:08 2015 @@ -61,155 +61,56 @@ <ul class="sidebar"> <li class="sidebar-header">Apache PDFBox</li> - <li> - <a href="/download.cgi"> - <i class="icon-chevron-right"></i> - Downloads - </a> - </li> - <li> - <a href="/dependencies.html"> - <i class="icon-chevron-right"></i> - Dependencies - </a> - </li> - <li> - <a href="/references.html"> - <i class="icon-chevron-right"></i> - References - </a> - </li> + <li><a href="/index.cgi">Overview</a></li> + <li><a href="/download.cgi">Downloads</a></li> + <li class="sidebar-header">Community</li> - <li> - <a href="/support.html"> - <i class="icon-chevron-right"></i> - Support - </a> - </li> - <li> - <a href="/mailinglists.html"> - <i class="icon-chevron-right"></i> - Mailing Lists - </a> - </li> - <li> - <a href="/team.html"> - <i class="icon-chevron-right"></i> - Project Team</a> - </li> + <li><a href="/support.html">Support</a></li> + <li><a href="/mailinglists.html">Mailing Lists</a></li> + <li><a href="/team.html">Project Team</a></li> + <li class="sidebar-header">Documentation</li> - <li> - <a href="/architecture.html"> - <i class="icon-chevron-right"></i> - Architecture - </a> - </li> - <li> - <a href="/commandline/"> - <i class="icon-chevron-right"></i> - Command Line Tools</a> - </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#"> - <i class="icon-chevron-right"></i> - PDFBox Cookbook <b class="caret"></b> - </a> - <ul class="dropdown-menu"> - <li> - <a href="/cookbook/documentcreation.html"> - <i class="icon-chevron-right"></i> - Document Creation</a> - </li> - <li> - <a href="/cookbook/textextraction.html"> - <i class="icon-chevron-right"></i> - Text Extraction</a> - </li> - <li> - <a href="/cookbook/pdfavalidation.html"> - <i class="icon-chevron-right"></i> - PDF/A Validation</a> - </li> - <li> - <a href="/cookbook/workingwithfonts.html"> - <i class="icon-chevron-right"></i> - Working with Fonts</a> - </li> - <li> - <a href="/cookbook/workingwithmetadata.html"> - <i class="icon-chevron-right"></i> - Working with Metadata</a> - </li> - <li> - <a href="/cookbook/workingwithattachments.html"> - <i class="icon-chevron-right"></i> - Working with Attachments</a> - </li> - <li> - <a href="/cookbook/pdfacreation.html"> - <i class="icon-chevron-right"></i> - Creating a PDF/A document</a> - </li> + <li class="sidebar-node"> + <a href="#">Trunk</a> + <ul> + <li><a href="/docs/2.0.0-SNAPSHOT/javadocs/">API Docs</a></li> </ul> </li> - <li class="dropdown"> - <a class="dropdown-toggle" data-toggle="dropdown" href="#"> - <i class="icon-chevron-right"></i> - API Docs <b class="caret"></b> - </a> - <ul class="dropdown-menu"> - <li> - <a href="/docs/2.0.0-SNAPSHOT/javadocs/"> - <i class="icon-chevron-right"></i> - Trunk</a> - </li> - <li> - <a href="/docs/1.8.8/javadocs/"> - <i class="icon-chevron-right"></i> - 1.8.8</a> - </li> + <li class="sidebar-node"> + <a href="#">1.8.8</a> + <ul> + <li><a href="/1.8/architecture.html">Architecture</a></li> + <li><a href="/1.8/dependencies.html">Dependencies</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#"> + Cookbook <b class="caret"></b> + </a> + <ul class="dropdown-menu"> + <li><a href="/1.8/cookbook/documentcreation.html">Document Creation</a></li> + <li><a href="/1.8/cookbook/textextraction.html">Text Extraction</a></li> + <li><a href="/1.8/cookbook/pdfavalidation.html">PDF/A Validation</a></li> + <li><a href="/1.8/cookbook/workingwithfonts.html">Working with Fonts</a></li> + <li><a href="/1.8/cookbook/workingwithmetadata.html">Working with Metadata</a></li> + <li><a href="/1.8/cookbook/workingwithattachments.html">Working with Attachments</a></li> + <li><a href="/1.8/cookbook/pdfacreation.html">Creating a PDF/A document</a></li> + </ul> + </li> + <li><a href="/1.8/commandline.html">Command Line Tools</a></li> + <li><a href="/docs/1.8.8/javadocs/">API Docs</a></li> + <li><a href="/1.8/userguide/faq.html">FAQ</a></li> </ul> </li> - <li class="sidebar-header">For Developers</li> - <li> - <a href="/userguide/faq.html"> - <i class="icon-chevron-right"></i> - FAQ - </a> - </li> - <li> - <a href="/building.html"> - <i class="icon-chevron-right"></i> - Building PDFBox</a> - </li> - <li> - <a href="/ideas.html"> - <i class="icon-chevron-right"></i> - Ideas - </a> - </li> - <li> - <a href="/codingconventions.html"> - <i class="icon-chevron-right"></i> - Coding Conventions</a> - </li> + + <li class="sidebar-header">Development</li> + <li><a href="/codingconventions.html">Coding Conventions</a></li> + <li><a href="/building.html">Building</a></li> + <li><a href="/ideas.html">Ideas</a></li> + <li><a href="/references.html">References</a></li> + <li class="sidebar-header">Apache Software Foundation</li> - <li> - <a href="http://www.apache.org/"> - <i class="icon-chevron-right"></i> - Apache Software Foundation</a> - </li> - <li> - <a href="http://www.apache.org/foundation/thanks.html"> - <i class="icon-chevron-right"></i> - ASF Sponsors</a> - </li> - <li> - <a href="http://www.apache.org/security/"> - <i class="icon-chevron-right"></i> - Security - </a> - </li> + <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">ASF Sponsors</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> </ul> </div> <div class="col-xs-9"> @@ -287,7 +188,6 @@ the ExtractText command line application <div class="span3"> <!-- nothing in here on purpose --> </div> - <div class="span9"> <p>Copyright © 2009–2015 <a href="http://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. <br/>Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.</p>