[ https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029630#comment-15029630 ]
Lars Torunski commented on PDFBOX-2996: --------------------------------------- [~tilman] Can you do some tests based on PDFBOX-3044 with my patch? Can you measure the performance impact using the bubble sort instead of the quicksort also? In my opinion there won't be essential performance impacts, because the current quicksort with choosing the right index for the pivot, is causing the stack overhead. And also most texts are sorted which results in O(n) cost for the bubble sort with O(1) memory usage. But your PDFBOX-3044 PDF set will show us some different results. https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms Later on I can provide a clean patch set for you. > StackOverflow in Quicksort > -------------------------- > > Key: PDFBOX-2996 > URL: https://issues.apache.org/jira/browse/PDFBOX-2996 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.10, 2.0.0 > Environment: Java 7 > Reporter: Manuel Aristaran > Attachments: 001991.pdf, Lars-v0-PDFBOX-2996.patch, QuickSort.java, > artikel1_20_arab.pdf-sorted-diff.txt, artikel1_20_arab.pdf-sorted-iter.txt, > artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch > > > Running PDFTextStripper through ExtractText triggers a StackOverflow > exception in the QuickSort implementation for [this particular > document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0]. > To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort > failing_sort.pdf}} > (Related to PDFBOX-1512) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org