pdfbox-docs git commit: PDFBOX-3030: add information about removed ReplaceText example

msahyoun Thu, 03 Mar 2016 13:29:59 -0800

Repository: pdfbox-docs
Updated Branches:
  refs/heads/master dfe9ec54f -> 1dd238654



PDFBOX-3030: add information about removed ReplaceText example


Project: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/commit/1dd23865
Tree: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/tree/1dd23865
Diff: http://git-wip-us.apache.org/repos/asf/pdfbox-docs/diff/1dd23865

Branch: refs/heads/master
Commit: 1dd2386542fd686b1dacd66687e60a2a371163f1
Parents: dfe9ec5
Author: Maruan Sahyoun <[email protected]>
Authored: Thu Mar 3 22:27:51 2016 +0100
Committer: Maruan Sahyoun <[email protected]>
Committed: Thu Mar 3 22:27:51 2016 +0100

----------------------------------------------------------------------
 content/2.0/migration.md | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1dd23865/content/2.0/migration.md
----------------------------------------------------------------------
diff --git a/content/2.0/migration.md b/content/2.0/migration.md
index e8e70e4..77290c0 100644
--- a/content/2.0/migration.md
+++ b/content/2.0/migration.md
@@ -215,4 +215,20 @@ for (PDField field : form.getFieldTree())
 }
 ~~~
 
-Most `PDField` subclasses now accept Java generic types such as `String` as 
parameters instead of the former `COSBase` subclasses.
\ No newline at end of file
+Most `PDField` subclasses now accept Java generic types such as `String` as 
parameters instead of the former `COSBase` subclasses.
+
+### Why was the ReplaceText example removed?  ###
+The ReplaceText example has been reomved as it gave the incorrect illusion 
that text can be replaced easily.
+Words are often split, as seen by this excerpt of a content stream:
+
+~~~
+[ (Do) -29 (c) -1 (umen) 30 (tation) ] TJ
+~~~
+
+Other problems will appear with font subsets: for example, if only the glyphs 
for a, b and c are used,
+these would be encoded as hex 0, 1 and 2, so you won't find "abc". 
Additionally, you can't replace "c" with "d" because it isn't part of the 
subset.
+
+You could also have problems with ligatures, e.g. "ff", "fl", "fi", "ffi", 
"ffl", which can be represented by a single code in many fonts.
+To understand this yourself, view any file with PDFDebugger and have a look at 
the "Contents" entry of a page.
+
+See also 
https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text
\ No newline at end of file

pdfbox-docs git commit: PDFBOX-3030: add information about removed ReplaceText example

Reply via email to