Repository: pdfbox-docs Updated Branches: refs/heads/asf-site 1203f0b90 -> 9e6dbf4ec
Site checkin for project Apache PDFBox Website Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/9e6dbf4e Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/9e6dbf4e Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/9e6dbf4e Branch: refs/heads/asf-site Commit: 9e6dbf4ecdfb25b25eb023454402e5daab3560be Parents: 1203f0b Author: Maruan Sahyoun <sahy...@fileaffairs.de> Authored: Sun Nov 29 14:40:32 2015 +0100 Committer: Maruan Sahyoun <sahy...@fileaffairs.de> Committed: Sun Nov 29 14:40:32 2015 +0100 ---------------------------------------------------------------------- content/2.0/migration.html | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/9e6dbf4e/content/2.0/migration.html ---------------------------------------------------------------------- diff --git a/content/2.0/migration.html b/content/2.0/migration.html index 8e5a665..184a617 100644 --- a/content/2.0/migration.html +++ b/content/2.0/migration.html @@ -216,6 +216,23 @@ and so on. The <code>add</code> method now supports all the different type of re <li><code>LosslessFactory.createFromImage</code> (this is best if you start with a BufferedImage).</li> </ul> +<h3 id="parsing-the-page-content">Parsing the Page Content</h3> + +<p>Getting the content for a page has been simplified.</p> + +<p>Prior to PDFBox 2.0 parsing the page content was done using</p> +<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">PDStream</span> <span class="n">contents</span> <span class="o">=</span> <span class="n">page</span><span class="o">.</span><span class="na">getContents</span><span class="o">();</span> +<span class="n">PDFStreamParser</span> <span class="n">parser</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDFStreamParser</span><span class="o">(</span><span class="n">contents</span><span class="o">.</span><span class="na">getStream</span><span class="o">());</span> +<span class="n">parser</span><span class="o">.</span><span class="na">parse</span><span class="o">();</span> +<span class="n">List</span><span class="o"><</span><span class="n">Object</span><span class="o">></span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="na">getTokens</span><span class="o">();</span> +</code></pre></div> +<p>With PDFBox 2.0 the code is reduced to </p> +<div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">PDFStreamParser</span> <span class="n">parser</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PDFStreamParser</span><span class="o">(</span><span class="n">page</span><span class="o">);</span> +<span class="n">parser</span><span class="o">.</span><span class="na">parse</span><span class="o">();</span> +<span class="n">List</span><span class="o"><</span><span class="n">Object</span><span class="o">></span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="na">getTokens</span><span class="o">();</span> +</code></pre></div> +<p>In addition this also works if the page content is defined as an <strong>array of content streams</strong>.</p> + <h3 id="iterating-pages">Iterating Pages</h3> <p>With PDFBox 2.0.0 the prefered way to iterate through the pages of a document is</p>