Dear all,

I am processing PDF of Asian Unicode characters. I need to get the bounding
boxes for every character inside the PDF.

When I try to extract the bounding box using the following code, I found
that the box for a non-BMP character (the first Asian character) has zero
width. However, it is displayed properly in the PDF.

Please help!

Regards,
wwkloo

iTextExtract_W.pdf
<http://itext-general.2136553.n4.nabble.com/file/n4657896/iTextExtract_W.pdf>  

===BEGIN CODE SEGMENT===
public class TestExtractionStategy :
iTextSharp.text.pdf.parser.ITextExtractionStrategy
{
        private StringBuilder txt = new StringBuilder();

        public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo
renderInfo)
        {
                Vector pds = renderInfo.GetDescentLine().GetStartPoint();
                Vector pde = renderInfo.GetDescentLine().GetEndPoint();
                Vector pas = renderInfo.GetAscentLine().GetStartPoint();
                Vector pae = renderInfo.GetAscentLine().GetEndPoint();
                Vector pbs = renderInfo.GetBaseline().GetStartPoint();
                Vector pbe = renderInfo.GetBaseline().GetEndPoint();

                txt.Append("["+renderInfo.GetText()+"]\r\n");
                txt.Append("Descent: " + (pde[Vector.I1] - 
pds[Vector.I1]).ToString() +
"\r\n");
                txt.Append("Ascent: " + (pae[Vector.I1] - 
pas[Vector.I1]).ToString() +
"\r\n");
                txt.Append("Base: " + (pbe[Vector.I1] - 
pbs[Vector.I1]).ToString() +
"\r\n");
        }

        public void RenderImage(ImageRenderInfo renderInfo) { }
        public string GetResultantText() { return txt.ToString(); }
        public void BeginTextBlock() { }
        public void EndTextBlock() { }
}

.
.
.

ITextExtractionStrategy strategy = new TestExtractionStategy();
String txt = PdfTextExtractor.GetTextFromPage(reader, (int)nudPage.Value,
strategy);
txtOutput.Text = txt;
===END CODE SEGMENT===

===BEGIN OUTPUT===
[1]
Descent: 9
Ascent: 9
Base: 9
[ ]
Descent: 9
Ascent: 9
Base: 9
[𠕇]
Descent: 0
Ascent: 0
Base: 0
[ ]
Descent: 4.68
Ascent: 4.68
Base: 4.68
[2]
Descent: 9
Ascent: 9
Base: 9
[ ]
Descent: 9
Ascent: 9
Base: 9
[鋛]
Descent: 18
Ascent: 18
Base: 18
[ ]
Descent: 9
Ascent: 9
Base: 9
===END OUTPUT===



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Text-box-of-non-BMP-character-with-zero-width-tp4657896.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to