This depends on the PDF,is the PDF Tagged? Then you might be able to find out what's the title and heading. If it's not tagged good luck with guessing the title and heading from the text found in the document.
On 24/06/2011 14:10, modie wrote:
Hi, Sorry, I am new to iTextSharp and cannot find documentation for it anyway, other than this forum. I am looking to extract content from a PDF document, but I need to be able to understand the structure / markup in the document. I want to extract the heading / title for the document which would generally found on the first page. Any ideas how I would do this? In html I would look for the h1 or h2 tag? PS - no, I dont want the title property of the document -- View this message in context: http://itext-general.2136553.n4.nabble.com/How-to-extract-title-heading-from-document-contents-tp3622357p3622357.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense.. http://p.sf.net/sfu/splunk-d2d-c1 _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
-- @redlabbe <http://twitter.com/redlabbe> redlab-log <http://www.redlab.be/blog>
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense.. http://p.sf.net/sfu/splunk-d2d-c1
_______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
