PanQuanyi created PDFBOX-1429:
---------------------------------
Summary: TextStripper should output Text with its member having
more information
Key: PDFBOX-1429
URL: https://issues.apache.org/jira/browse/PDFBOX-1429
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 1.7.1
Reporter: PanQuanyi
Priority: Critical
the Class org.apache.pdfbox.util.TextPosition offer just offer position of text
in a page and limited Font info , (many chinese character not having
FontDescriptor, so fontName and other style can not be retrieved. )
I think many people use PDFBox to build a client util to extract text and
image,
and then reorginize the text and image to form a new article or book which
will be read on ipad or mobile phone with the help of manual work to solve the
layout ,
but many book which have complex laout and color has so many page make this
work need much human effort, if more work can be done automatically, it can be
efficient.
so ,if a Class named Text with precise position ,fontSize ,font style and color
and other such as background color can easily getted.
the process of Text extraction also including exclude unnessary text, make
text more colorful , can be easier.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira