pdfbox-docs git commit: PDFBOX-3030: add info to parse page content

msahyoun Sun, 29 Nov 2015 05:40:07 -0800

Repository: pdfbox-docs
Updated Branches:
  refs/heads/master 929abe2ff -> f18cd3e3f



PDFBOX-3030: add info to parse page content


Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/f18cd3e3
Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/f18cd3e3
Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/f18cd3e3

Branch: refs/heads/master
Commit: f18cd3e3fac1a8045a3e1d916eccef144b6f8313
Parents: 929abe2
Author: Maruan Sahyoun <sahy...@fileaffairs.de>
Authored: Sun Nov 29 14:39:19 2015 +0100
Committer: Maruan Sahyoun <sahy...@fileaffairs.de>
Committed: Sun Nov 29 14:39:19 2015 +0100

----------------------------------------------------------------------
 content/2.0/migration.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/f18cd3e3/content/2.0/migration.md
----------------------------------------------------------------------
diff --git a/content/2.0/migration.md b/content/2.0/migration.md
index 3550b46..3ab9669 100644
--- a/content/2.0/migration.md
+++ b/content/2.0/migration.md
@@ -89,6 +89,28 @@ In addition there are some specialized classes:
 - `CCITTFactory.createFromFile` (for bitonal TIFF images with G4 compression).
 - `LosslessFactory.createFromImage` (this is best if you start with a 
BufferedImage).
 
+### Parsing the Page Content
+Getting the content for a page has been simplified.
+
+Prior to PDFBox 2.0 parsing the page content was done using
+
+~~~java
+PDStream contents = page.getContents();
+PDFStreamParser parser = new PDFStreamParser(contents.getStream());
+parser.parse();
+List<Object> tokens = parser.getTokens();
+~~~
+
+With PDFBox 2.0 the code is reduced to 
+
+~~~java
+PDFStreamParser parser = new PDFStreamParser(page);
+parser.parse();
+List<Object> tokens = parser.getTokens();
+~~~
+
+In addition this also works if the page content is defined as an **array of 
content streams**.
+
 ### Iterating Pages
 With PDFBox 2.0.0 the prefered way to iterate through the pages of a document 
is

pdfbox-docs git commit: PDFBOX-3030: add info to parse page content

Reply via email to