Dear all, I am processing PDF of Asian Unicode characters. I need to get the bounding boxes for every character inside the PDF.
When I try to extract the bounding box using the following code, I found that the box for a non-BMP character (the first Asian character) has zero width. However, it is displayed properly in the PDF. Please help! Regards, wwkloo iTextExtract_W.pdf <http://itext-general.2136553.n4.nabble.com/file/n4657896/iTextExtract_W.pdf> ===BEGIN CODE SEGMENT=== public class TestExtractionStategy : iTextSharp.text.pdf.parser.ITextExtractionStrategy { private StringBuilder txt = new StringBuilder(); public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo) { Vector pds = renderInfo.GetDescentLine().GetStartPoint(); Vector pde = renderInfo.GetDescentLine().GetEndPoint(); Vector pas = renderInfo.GetAscentLine().GetStartPoint(); Vector pae = renderInfo.GetAscentLine().GetEndPoint(); Vector pbs = renderInfo.GetBaseline().GetStartPoint(); Vector pbe = renderInfo.GetBaseline().GetEndPoint(); txt.Append("["+renderInfo.GetText()+"]\r\n"); txt.Append("Descent: " + (pde[Vector.I1] - pds[Vector.I1]).ToString() + "\r\n"); txt.Append("Ascent: " + (pae[Vector.I1] - pas[Vector.I1]).ToString() + "\r\n"); txt.Append("Base: " + (pbe[Vector.I1] - pbs[Vector.I1]).ToString() + "\r\n"); } public void RenderImage(ImageRenderInfo renderInfo) { } public string GetResultantText() { return txt.ToString(); } public void BeginTextBlock() { } public void EndTextBlock() { } } . . . ITextExtractionStrategy strategy = new TestExtractionStategy(); String txt = PdfTextExtractor.GetTextFromPage(reader, (int)nudPage.Value, strategy); txtOutput.Text = txt; ===END CODE SEGMENT=== ===BEGIN OUTPUT=== [1] Descent: 9 Ascent: 9 Base: 9 [ ] Descent: 9 Ascent: 9 Base: 9 [𠕇] Descent: 0 Ascent: 0 Base: 0 [ ] Descent: 4.68 Ascent: 4.68 Base: 4.68 [2] Descent: 9 Ascent: 9 Base: 9 [ ] Descent: 9 Ascent: 9 Base: 9 [鋛] Descent: 18 Ascent: 18 Base: 18 [ ] Descent: 9 Ascent: 9 Base: 9 ===END OUTPUT=== -- View this message in context: http://itext-general.2136553.n4.nabble.com/Text-box-of-non-BMP-character-with-zero-width-tp4657896.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php