[ 
https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199201#comment-13199201
 ] 

Pablo Castellanos commented on LUCENE-2482:
-------------------------------------------

Hi, I wanted to implement some early termination strategies over my Lucene 
index so I started playing with the 4.0 patch as I need to reorder it.

So I have found that a lot of functions have changed in the past year and I had 
to go for some modifications, mainly:

{code}
/*@Override
public TermFreqVector[] getTermFreqVectors(int docNumber)
        throws IOException {
  return super.getTermFreqVectors(newToOld[docNumber]);
}*/

@Override
public Fields getTermVectors(int docID) throws IOException {
return super.getTermVectors(newToOld[docID]);
}

/*@Override
public Document document(int n, FieldSelector fieldSelector)
        throws CorruptIndexException, IOException {
  return super.document(newToOld[n], fieldSelector);
}*/

@Override
public void document(int docID, StoredFieldVisitor visitor)
throws CorruptIndexException, IOException {
super.document(newToOld[docID], visitor);
}
{code}

There exists also a getDeletedDocs function and I haven't found any good 
replacement for it

{code}
    /*@Override
    public Bits getDeletedDocs() {
      final Bits deletedDocs = super.getDeletedDocs();

      if (deletedDocs == null)
        return null;

      return new Bits() {
        @Override
        public boolean get(int index) {
          return deletedDocs.get(newToOld[index]);
        }

        @Override
        public int length() {
          return deletedDocs.length();
        }
      };
    }*/
{code}

After applying these changes and using the code against my lucene index I get 
some weird results. It seems that the new sorting has worked but the posting 
list that access to the documents is still pointing to the old data.

Imagine that I have 2 documents in my index and that I want to sort them by 
price (So the most expensive item should have a lower docId)

Document 1
{panel}docId:1, name: iPod, price: 100${panel}

Document 2
{panel}docId:2, name: iPhone, price: 300${panel}

I run my modified version of IndexSorter over it and after that I try to query 
the new index, so if I query for _name:iPhone_ I get:
{panel}docId:2, name: iPod, price: 100${panel}

That leads me to believe that the documents have been sorted but the new index 
is using the old posting list. 

So I have two questions, are you planning on updating this code for newer 
versions of Lucene 4.0 or am I on my own to get it to work? And if this is the 
case, where should I look for getting a solution for my problem?

Thanks in advance for your help.
                
> Index sorter
> ------------
>
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/other
>    Affects Versions: 3.1, 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-2482-4.0.patch, indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with 
> high weight are given low document numbers, which means that they will be 
> first evaluated. When using a strategy of "early termination" of queries (see 
> TimeLimitedCollector) such sorting significantly improves the quality of 
> partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as 
> document weights - thus the ordering was limited by the limited resolution of 
> norms. This is a pure Lucene version of the tool, and it uses arbitrary 
> floats from a specified stored field).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to