Hello, My use case is I extract text from the same pdf in 2 ways : one sorted and one non sorted. This process takes 2 seconds. Its too long (I have 1M pdf to extract)
I wonder if it could be feaseable to modify the code ( https://github.com/apache/pdfbox/blob/trunk/tools/src/main/java/org/apache/pdfbox/tools/ExtractText.java) in order to combine the two actions in one. The output would be something like extractSorted separator extractNonSorted And the command line would be "pdfbox..extractText -combine -nonSort -sort" . Maybe this is not a good idea. Then have you advices in order to improve extract performances ? Thanks by advance,
