[ 
https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-CollSupport.patch

Robert tested yesterday and found out that SorterTemplate.quickSort is not as 
efficient as it could be. The general problem is:
- Quicksort needs the value of the pivot/partition element and the main sorting 
step compares this single value quite often
- For our in-place algorithm that only used swap(i,j) and compare(i,j), the 
main loop's swap statements needed an extra check that not the pivot index is 
swapped and so the pivot changes suddenly. Because of this when the index of 
the pivot is swapped, the pivot index value needed to be updated.

I changed SorterTemplate to look more like FieldComparator known from search. 
It now has not only swap(index1,index2) and compare(index1,index2), it also 
gets setPivot(index) [stores index' value as pivot] and comparePivot(index) 
[compares given index' value with previously stored pivot value]. Now the 
quicksort algorithm is identical to the one seen everywhere in Lucene before. 
We can now also implement the optimized one from harmony also seen in Solr's 
PrimUtil. I will look into this, if it makes sense (it makes not always sense 
as comparing and swapping is more intensive for non-native values!).

This has also some improvements to BytesRefHash, as there are less 
de-references of BytesRefs, because the main quickSort loop only compares an 
index with the in setPivot dereferenced BytesRefs. Before it did this on every 
compare step!

Robert: Can you supply your benchmark? Or test again :-)

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace 
> BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, 
> LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, 
> LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene 
> (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using 
> Arrays.sort() is slow, because it clones internally to ensure stable search. 
> This component is much faster. This patch adds Arrays.sort() replacements in 
> ArrayUtil that work with natural order or using a Comparator<?>. You can 
> choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for 
> very short ord arrays. This class uses SorterTemplate to provide the same 
> with insertionSort fallback in a very elegant way. Ideally this class can be 
> used everywhere, where the sort algorithm needs to be separated from the 
> underlying data and you can implement a swap() and compare() function (that 
> get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for 
> short arrays, it automatically chooses insertionSort (like JDK's Arrays). The 
> quickSort algorith was copied modified from old BytesRefHash. This new class 
> only shares MergeSort with the original CGLIB SorterTemplate, which is no 
> longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to