[ 
https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463727
 ] 

Hoss Man commented on LUCENE-769:
---------------------------------

Artem: I've only skimmed your patch breifly, but i have a few comments:

1) since Lucene sorting has historicly been based on *indexed* fields, and this 
new patch results in fields being sorted on the *stored* values, you should 
definitely point this out in the javadocs of your DocCachingSortFactory and 
DocCachingFieldComparatorSource classes .. in big bold letters .. i would go 
even so far as to name the classes StoredFieldComparatorSrouce and 
StoredFieldSortFactory instead of the names you currently use.

2) your use of WeakHashMap with keys that are *never* refrenced elsewhere to 
ensure the cache is purged on every GC is not something i've ever seen before...

+public class WeakDocumentsCache {
+    private Map cache = Collections.synchronizedMap(new WeakHashMap());
+
+    public Document getById(int docId) {
+        return (Document) cache.get(new Integer(docId));
+    }
+
+    public void put(int docId, Document document) {
+        cache.put(new Integer(docId), document);
+    }

...have you tested out the performacne of this approach with various GC 
implimentations?

skimming the javadocs for WeakReference/WeakHashMap it seems like perhaps 
SoftRefrence would be better suited for your purposes.

3) making your new DocCachingIndexReader and WeakDocumentsCache clases part of 
the Lucene public API seems to be a little outside of the scope of this change 
... perhapsthey should be left as a private static inner class es inside of the 
individual classes they are used by (DocCachingFieldComparatorSource and 
DocCachingIndexReader respectively) .... even if these classes are left public, 
DocCachingIndexReader should probably subclass FilterIndexReader to reduce the 
amount of duplication.

4) your check for FieldSelector usage in DocCachingIndexReader doesn't check if 
the FieldSelector used is the same every time -- which means you can't trust 
your cache ... fixing this could be complicated, and serves as another reason 
why it would be easier if DocCachingIndexReader was made a private inner class 
of DocCachingFieldComparatorSource where you know exactly how it's going to be 
used.

5) speaking of FieldSelector, your use case seems like a perfect example of 
when a FieldSelector would make sense to only read the field(s) that are needed 
for sorting.


> [PATCH] Performance improvement for some cases of sorted search
> ---------------------------------------------------------------
>
>                 Key: LUCENE-769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-769
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Artem Vasiliev
>         Attachments: DocCachingSorting.patch, DocCachingSorting.patch
>
>
> It's a small addition to Lucene that significantly lowers memory consumption 
> and improves performance for sorted searches with frequent index updates and 
> relatively big indexes (>1mln docs) scenario. This solution supports only 
> single-field sorting currently (which seem to be quite popular use case). 
> Multiple fields support can be added without much trouble.
> The solution is this: documents from the sorting set (instead of given 
> field's values from the whole index - current FieldCache approach) are cached 
> in a WeakHashMap so the cached items are candidates for GC.  Their fields 
> values are then fetched from the cache and compared while sorting.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to