Repository: pdfbox-docs Updated Branches: refs/heads/master dfe9ec54f -> 1dd238654
PDFBOX-3030: add information about removed ReplaceText example Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/1dd23865 Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/1dd23865 Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/1dd23865 Branch: refs/heads/master Commit: 1dd2386542fd686b1dacd66687e60a2a371163f1 Parents: dfe9ec5 Author: Maruan Sahyoun <[email protected]> Authored: Thu Mar 3 22:27:51 2016 +0100 Committer: Maruan Sahyoun <[email protected]> Committed: Thu Mar 3 22:27:51 2016 +0100 ---------------------------------------------------------------------- content/2.0/migration.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1dd23865/content/2.0/migration.md ---------------------------------------------------------------------- diff --git a/content/2.0/migration.md b/content/2.0/migration.md index e8e70e4..77290c0 100644 --- a/content/2.0/migration.md +++ b/content/2.0/migration.md @@ -215,4 +215,20 @@ for (PDField field : form.getFieldTree()) } ~~~ -Most `PDField` subclasses now accept Java generic types such as `String` as parameters instead of the former `COSBase` subclasses. \ No newline at end of file +Most `PDField` subclasses now accept Java generic types such as `String` as parameters instead of the former `COSBase` subclasses. + +### Why was the ReplaceText example removed? ### +The ReplaceText example has been reomved as it gave the incorrect illusion that text can be replaced easily. +Words are often split, as seen by this excerpt of a content stream: + +~~~ +[ (Do) -29 (c) -1 (umen) 30 (tation) ] TJ +~~~ + +Other problems will appear with font subsets: for example, if only the glyphs for a, b and c are used, +these would be encoded as hex 0, 1 and 2, so you won't find "abc". Additionally, you can't replace "c" with "d" because it isn't part of the subset. + +You could also have problems with ligatures, e.g. "ff", "fl", "fi", "ffi", "ffl", which can be represented by a single code in many fonts. +To understand this yourself, view any file with PDFDebugger and have a look at the "Contents" entry of a page. + +See also https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text \ No newline at end of file
