What you actually /want/ is virtually impossible in the general case. It may be possible for your specific PDFs, but we can't know that unless we actually see one.
There is a "positional text extraction" RenderListener: LocationTextExtractionStrategy. It groups text by orientation, and then by reading order... IIRC. Pretty much, yeah: * This renderer keeps track of the orientation and distance (both perpendicular * and parallel) to the unit vector of the orientation. Text is ordered by * orientation, then perpendicular, then parallel distance. Text with the same * perpendicular distance, but different parallel distance is treated as being on * the same line. * <br> * This renderer also uses a simple strategy based on the font metrics to determine if * a blank space should be inserted into the output. It won't separate the header and footer from the body, but its probably your best bet. --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer<Cardiff> DisCard = null; > -----Original Message----- > From: DivyaKambhatla [mailto:[email protected]] > Sent: Wednesday, February 16, 2011 5:40 AM > To: [email protected] > Subject: [iText-questions] Split a PDF Page into header , > footer and body. > > > Hi, > > Could anyone please let me know if it is possible via > iText5.0.5 to split a PDF Page into its header, footer, body > and watermark sections and access each content separately. I > am dealing with both watermarked and non-watermarked PDFs. > > When i extract the content from a PDF using > iText5.0.5, the order in which the extraction happens is as follows: > > 1. Watermark gets extracted first (if it exists) > 2. Page Text Content gets extracted next > 3. The titles of any figures that are present > in the PDF Page. > 4. Footer Content gets extracted. > 5. Header content gets extracted last. > > Is there any way of extraction such that , the complete > PDF Body content can be extracted first and the remaining > content such as watermarks, headers , footers be extracted > next so that the order of the extracted text is not lost. > > Thanks, > Divya. > > -- > View this message in context: > http://itext-general.2136553.n4.nabble.com/Split-a-PDF-Page-in to-header-footer-and-body-tp3308836p3308836.html > Sent from the iText - General mailing list archive at Nabble.com. > > -------------------------------------------------------------- > ---------------- > The ultimate all-in-one performance toolkit: Intel(R) > Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Many questions posted to this list can (and will) be answered > with a reference to the iText book: > http://www.itextpdf.com/book/ Please check the keywords list > before you ask for examples: http://itextpdf.com/themes/keywords.php > > ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
