[jira] [Updated] (PDFBOX-540) New functionality for class inherited from PDFTextStripperByArea class

John Hewson (JIRA) Sat, 08 Feb 2014 10:08:38 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


John Hewson updated PDFBOX-540:
-------------------------------

    Component/s:     (was: Swing GUI)
                 Utilities

> New functionality for class inherited from PDFTextStripperByArea class
> ----------------------------------------------------------------------
>
>                 Key: PDFBOX-540
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-540
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 0.7.3
>         Environment: Windows Vista
>            Reporter: Alexander Shvartz
>
> New functionality for class inherited from PDFTextStripperByArea class
> We were working with org.apache.pdfbox.util.PDFTextStripperByArea class. 
> Using Rectangle class and methods of PDFTextStripperByArea class, such as 
> getTextForRegion() and others, we have received the text that was identified 
> in that region (in our case specific PDF page).
> Our goal was to connect PDFTextStripperByArea class with TextPosition class 
> which has methods to manage characters, such as getX(), getY() and many 
> others. For this we supposed to use getCharactersByArticle() method. This is 
> protected method from PDFTextStripper class, but we need this method in 
> PDFTextStripperByArea class.
> For this reason we suggest to create a new class by name (for example) 
> PDFTextStripperByAreaChar, inherited from PDFTextStripperByArea class, and 
> add to the new class functionality with public getCharactersByArticle() 
> method:
> //The class inherited from PDFTextStripperByArea with the additional 
> //functionality - the method getCharactersByArticle() taken like example
> //from PDFTextStripper
> package org.apache.pdfbox.util;
> import java.io.IOException;
> import java.util.List;
> public class PDFTextStripperByAreaChar extends PDFTextStripperByArea
> {
>       public PDFTextStripperByAreaChar() throws IOException
>            {
>               super();
>               
>       }
>       
>       public List getCharactersByArticle()
>            {
>               return charactersByArticle;
>       }
> }
> The example is:
> PDFTextStripperByAreaChar stripperText = new PDFTextStripperByAreaChar();
> //By idea originally taken from getTitle() method of 
> //org.apache.pdfbox.util.PDFText2HTML class we can run the code to get X and 
> Y coordinates of the special character:
> Iterator textIter = stripperText.getCharactersByArticle().iterator();
> String charPDF;
> while (textIter.hasNext())
> {
> Iterator textByArticle = ((List) textIter.next()).iterator();
>       int j = 1;
>       while (textByArticle.hasNext())
>       {
>               TextPosition text = (TextPosition) textByArticle.next();
>                          
>             charPDF = text.getCharacter();
>             
>             System.out.println("Char " + j + ": |" + charPDF + 
>                          "| X = " + text.getX() + ", Y = " + text.getY());
>                                                                       
>             j++;
>       }
> }  
> Thank you.
> DeepDyve developers:
> Alexander Shvartz,
> Raza Mobin,



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PDFBOX-540) New functionality for class inherited from PDFTextStripperByArea class

Reply via email to