PanQuanyi created PDFBOX-1429:
---------------------------------

             Summary: TextStripper should output Text with its member having 
more information 
                 Key: PDFBOX-1429
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1429
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
    Affects Versions: 1.7.1
            Reporter: PanQuanyi
            Priority: Critical


the Class org.apache.pdfbox.util.TextPosition offer just offer position of text 
in a page and limited Font info , (many chinese character not having 
FontDescriptor, so fontName and other style can not be retrieved. )
 I think many people use PDFBox to build a client util to extract text and 
image,
 and then reorginize the text and image to form a new article or book which 
will be read on ipad or mobile phone with the help of manual work to solve the 
layout , 
but many book which have complex laout and color has so many page make this 
work need much human effort, if more work can be done automatically, it can be  
efficient.

so ,if a Class named Text with precise position ,fontSize ,font style and color 
and other such as background color can easily getted. 
the process of Text extraction  also including exclude unnessary text, make 
text more colorful , can be easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to