[ 
https://issues.apache.org/jira/browse/PDFBOX-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029630#comment-15029630
 ] 

Lars Torunski commented on PDFBOX-2996:
---------------------------------------

[~tilman] Can you do some tests based on PDFBOX-3044 with my patch? Can you 
measure the performance impact using the bubble sort instead of the quicksort 
also?

In my opinion there won't be essential performance impacts, because the current 
quicksort with choosing the right index for the pivot, is causing the stack 
overhead. And also most texts are sorted which results in O(n) cost for the 
bubble sort with O(1) memory usage. But your PDFBOX-3044 PDF set will show us 
some different results.

https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms

Later on I can provide a clean patch set for you.





> StackOverflow in Quicksort
> --------------------------
>
>                 Key: PDFBOX-2996
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2996
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.10, 2.0.0
>         Environment: Java 7
>            Reporter: Manuel Aristaran
>         Attachments: 001991.pdf, Lars-v0-PDFBOX-2996.patch, QuickSort.java, 
> artikel1_20_arab.pdf-sorted-diff.txt, artikel1_20_arab.pdf-sorted-iter.txt, 
> artikel1_20_arab.pdf-sorted-rekur.txt, failing_sort.pdf, quicksort.patch
>
>
> Running PDFTextStripper through ExtractText triggers a StackOverflow 
> exception in the QuickSort implementation for [this particular 
> document|https://www.dropbox.com/s/6crie7y5gqadwa5/1.pdf?dl=0].
> To reproduce: {{java -jar pdfbox-app-1.8.11-SNAPSHOT.jar ExtractText -sort 
> failing_sort.pdf}}
> (Related to PDFBOX-1512)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to