[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

Tilman Hausherr (JIRA) Sat, 19 Dec 2015 22:35:55 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065643#comment-15065643
 ]


Tilman Hausherr commented on PDFBOX-3166:
-----------------------------------------

You can use PDFDebugger
http://mirror.softaculous.com/apache/pdfbox/2.0.0-RC2/debugger-app-2.0.0-RC2.jar
or use this code:
{code}
PDDocument doc = PDDocument.load(file);
InputStream is = document.getPage(0).getContents();
{code}
and then read that input stream, this will show you the content stream that I 
mentioned earlier.


> Unwanted spaces before number in chinese text extraction
> --------------------------------------------------------
>
>                 Key: PDFBOX-3166
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3166
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>         Environment: Windows
>            Reporter: Gang Luo
>              Labels: test
>         Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

Reply via email to