pdfbox-docs git commit: Site checkin for project Apache PDFBox Website

msahyoun Sun, 29 Nov 2015 05:41:00 -0800

Repository: pdfbox-docs
Updated Branches:
  refs/heads/asf-site 1203f0b90 -> 9e6dbf4ec



Site checkin for project Apache PDFBox Website


Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/9e6dbf4e
Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/9e6dbf4e
Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/9e6dbf4e

Branch: refs/heads/asf-site
Commit: 9e6dbf4ecdfb25b25eb023454402e5daab3560be
Parents: 1203f0b
Author: Maruan Sahyoun <sahy...@fileaffairs.de>
Authored: Sun Nov 29 14:40:32 2015 +0100
Committer: Maruan Sahyoun <sahy...@fileaffairs.de>
Committed: Sun Nov 29 14:40:32 2015 +0100

----------------------------------------------------------------------
 content/2.0/migration.html | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/9e6dbf4e/content/2.0/migration.html
----------------------------------------------------------------------
diff --git a/content/2.0/migration.html b/content/2.0/migration.html
index 8e5a665..184a617 100644
--- a/content/2.0/migration.html
+++ b/content/2.0/migration.html
@@ -216,6 +216,23 @@ and so on. The <code>add</code> method now supports all 
the different type of re
 <li><code>LosslessFactory.createFromImage</code> (this is best if you start 
with a BufferedImage).</li>
 </ul>
 
+<h3 id="parsing-the-page-content">Parsing the Page Content</h3>
+
+<p>Getting the content for a page has been simplified.</p>
+
+<p>Prior to PDFBox 2.0 parsing the page content was done using</p>
+<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="n">PDStream</span> <span class="n">contents</span> <span 
class="o">=</span> <span class="n">page</span><span class="o">.</span><span 
class="na">getContents</span><span class="o">();</span>
+<span class="n">PDFStreamParser</span> <span class="n">parser</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">PDFStreamParser</span><span class="o">(</span><span 
class="n">contents</span><span class="o">.</span><span 
class="na">getStream</span><span class="o">());</span>
+<span class="n">parser</span><span class="o">.</span><span 
class="na">parse</span><span class="o">();</span>
+<span class="n">List</span><span class="o">&lt;</span><span 
class="n">Object</span><span class="o">&gt;</span> <span 
class="n">tokens</span> <span class="o">=</span> <span 
class="n">parser</span><span class="o">.</span><span 
class="na">getTokens</span><span class="o">();</span>
+</code></pre></div>
+<p>With PDFBox 2.0 the code is reduced to </p>
+<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="n">PDFStreamParser</span> <span class="n">parser</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">PDFStreamParser</span><span class="o">(</span><span 
class="n">page</span><span class="o">);</span>
+<span class="n">parser</span><span class="o">.</span><span 
class="na">parse</span><span class="o">();</span>
+<span class="n">List</span><span class="o">&lt;</span><span 
class="n">Object</span><span class="o">&gt;</span> <span 
class="n">tokens</span> <span class="o">=</span> <span 
class="n">parser</span><span class="o">.</span><span 
class="na">getTokens</span><span class="o">();</span>
+</code></pre></div>
+<p>In addition this also works if the page content is defined as an 
<strong>array of content streams</strong>.</p>
+
 <h3 id="iterating-pages">Iterating Pages</h3>
 
 <p>With PDFBox 2.0.0 the prefered way to iterate through the pages of a 
document is</p>

pdfbox-docs git commit: Site checkin for project Apache PDFBox Website

Reply via email to