[
https://issues.apache.org/jira/browse/PDFBOX-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916208#comment-13916208
]
Tilman Hausherr commented on PDFBOX-1207:
-----------------------------------------
[~danny11] Re time: PDFBOX is a slowly growing project and PDF is a very
complex [designed by
comittee|https://en.wikipedia.org/wiki/Design_by_committee] format. PDFBOX is
over 10 years old and its stil under construction (although pretty good). So we
(unpaid volunteers in our free time) focus on getting things done (instead of
having nothing), fixing bugs, etc. Doing things faster is rather the next step
(although there has been an effort re: colors). If you have some free time,
you're of course welcome to fire up your java profiler and identify places that
could perform better :-)
> PDFPageProcessor.processStream() take 10 minutes to return
> ----------------------------------------------------------
>
> Key: PDFBOX-1207
> URL: https://issues.apache.org/jira/browse/PDFBOX-1207
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Environment: Seen on multiple platforms
> Reporter: Dan Krause
> Labels: RepairMode
>
> Attempting to extract images and text from each page. Long processing time is
> specific to this file:
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/pdf/Installation_Guide/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US.pdf
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)