[ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199201#comment-13199201 ]
Pablo Castellanos commented on LUCENE-2482: ------------------------------------------- Hi, I wanted to implement some early termination strategies over my Lucene index so I started playing with the 4.0 patch as I need to reorder it. So I have found that a lot of functions have changed in the past year and I had to go for some modifications, mainly: {code} /*@Override public TermFreqVector[] getTermFreqVectors(int docNumber) throws IOException { return super.getTermFreqVectors(newToOld[docNumber]); }*/ @Override public Fields getTermVectors(int docID) throws IOException { return super.getTermVectors(newToOld[docID]); } /*@Override public Document document(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException { return super.document(newToOld[n], fieldSelector); }*/ @Override public void document(int docID, StoredFieldVisitor visitor) throws CorruptIndexException, IOException { super.document(newToOld[docID], visitor); } {code} There exists also a getDeletedDocs function and I haven't found any good replacement for it {code} /*@Override public Bits getDeletedDocs() { final Bits deletedDocs = super.getDeletedDocs(); if (deletedDocs == null) return null; return new Bits() { @Override public boolean get(int index) { return deletedDocs.get(newToOld[index]); } @Override public int length() { return deletedDocs.length(); } }; }*/ {code} After applying these changes and using the code against my lucene index I get some weird results. It seems that the new sorting has worked but the posting list that access to the documents is still pointing to the old data. Imagine that I have 2 documents in my index and that I want to sort them by price (So the most expensive item should have a lower docId) Document 1 {panel}docId:1, name: iPod, price: 100${panel} Document 2 {panel}docId:2, name: iPhone, price: 300${panel} I run my modified version of IndexSorter over it and after that I try to query the new index, so if I query for _name:iPhone_ I get: {panel}docId:2, name: iPod, price: 100${panel} That leads me to believe that the documents have been sorted but the new index is using the old posting list. So I have two questions, are you planning on updating this code for newer versions of Lucene 4.0 or am I on my own to get it to work? And if this is the case, where should I look for getting a solution for my problem? Thanks in advance for your help. > Index sorter > ------------ > > Key: LUCENE-2482 > URL: https://issues.apache.org/jira/browse/LUCENE-2482 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/other > Affects Versions: 3.1, 4.0 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: 3.6, 4.0 > > Attachments: LUCENE-2482-4.0.patch, indexSorter.patch > > > A tool to sort index according to a float document weight. Documents with > high weight are given low document numbers, which means that they will be > first evaluated. When using a strategy of "early termination" of queries (see > TimeLimitedCollector) such sorting significantly improves the quality of > partial results. > (Originally this tool was created by Doug Cutting in Nutch, and used norms as > document weights - thus the ordering was limited by the limited resolution of > norms. This is a pure Lucene version of the tool, and it uses arbitrary > floats from a specified stored field). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org