Repository: pdfbox-docs Updated Branches: refs/heads/master 929abe2ff -> f18cd3e3f
PDFBOX-3030: add info to parse page content Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/f18cd3e3 Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/f18cd3e3 Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/f18cd3e3 Branch: refs/heads/master Commit: f18cd3e3fac1a8045a3e1d916eccef144b6f8313 Parents: 929abe2 Author: Maruan Sahyoun <sahy...@fileaffairs.de> Authored: Sun Nov 29 14:39:19 2015 +0100 Committer: Maruan Sahyoun <sahy...@fileaffairs.de> Committed: Sun Nov 29 14:39:19 2015 +0100 ---------------------------------------------------------------------- content/2.0/migration.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/f18cd3e3/content/2.0/migration.md ---------------------------------------------------------------------- diff --git a/content/2.0/migration.md b/content/2.0/migration.md index 3550b46..3ab9669 100644 --- a/content/2.0/migration.md +++ b/content/2.0/migration.md @@ -89,6 +89,28 @@ In addition there are some specialized classes: - `CCITTFactory.createFromFile` (for bitonal TIFF images with G4 compression). - `LosslessFactory.createFromImage` (this is best if you start with a BufferedImage). +### Parsing the Page Content +Getting the content for a page has been simplified. + +Prior to PDFBox 2.0 parsing the page content was done using + +~~~java +PDStream contents = page.getContents(); +PDFStreamParser parser = new PDFStreamParser(contents.getStream()); +parser.parse(); +List<Object> tokens = parser.getTokens(); +~~~ + +With PDFBox 2.0 the code is reduced to + +~~~java +PDFStreamParser parser = new PDFStreamParser(page); +parser.parse(); +List<Object> tokens = parser.getTokens(); +~~~ + +In addition this also works if the page content is defined as an **array of content streams**. + ### Iterating Pages With PDFBox 2.0.0 the prefered way to iterate through the pages of a document is