[jira] Commented: (LUCENENET-183) SegmentTermVector IndexOf method always fails

Digy (JIRA) Tue, 04 May 2010 10:09:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENENET-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863890#action_12863890
 ]


Digy commented on LUCENENET-183:
--------------------------------

Below is the mail from *Bernie Solomon*
{code}
Having hit the same problem I am puzzled why the fix for LUCENENET-183 seems to 
have got reverted. Lucene.NET
 and lucene do not seem to be consistent as they currently are as the following 
short test programs do different things. 
Java correctly has 1 for index and C# incorrectly prints -1.
 The proposed fix does address this. Am I missing something?

Thanks

Bernie

--- Java ---
import java.lang.*;
import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.*;

class Test
{
    public static void main(String [] args)
    {
        try
        {
            RAMDirectory directory = new RAMDirectory();
            Analyzer analyzer = new WhitespaceAnalyzer();
            IndexWriter writer = new IndexWriter(directory, analyzer, 
IndexWriter.MaxFieldLength.LIMITED);
            Document document = new Document();
            document.add(new Field("contents", new StringReader("a_ a0"), 
Field.TermVector.WITH_OFFSETS));
            writer.addDocument(document);
            IndexReader reader = writer.getReader();
            TermPositionVector tpv = 
(TermPositionVector)reader.getTermFreqVector(0, "contents");
            System.out.println("tpv: " + tpv);
            int index = tpv.indexOf("a_");
            System.out.println("index: " + index);
        }
        catch (Exception ex)
        {
        }
    }
}--- C# ---using System;
using System.IO;
using System.Text;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Store;

public class Test
{
    public static void Main(string [] args)
    {
        RAMDirectory directory = new RAMDirectory();
        Analyzer analyzer = new WhitespaceAnalyzer();
        IndexWriter writer = new IndexWriter(directory, analyzer, 
IndexWriter.MaxFieldLength.LIMITED);
        Document document = new Document();
        document.Add(new Field("contents", new StreamReader(new 
MemoryStream(Encoding.ASCII.GetBytes("a_ a0"))), 
Field.TermVector.WITH_OFFSETS));
        writer.AddDocument(document);
        IndexReader reader = writer.GetReader();
        TermPositionVector tpv = reader.GetTermFreqVector(0, "contents") as 
TermPositionVector;
        Console.WriteLine("tpv: " + tpv);
        int index = tpv.IndexOf("a_");
        Console.WriteLine("index: " + index);
    }
}
{code}

Thanks Bernie,

This patch is lost while porting 2.9.0
I recommitted the patch. (for 2.9.1 & 2.9.2 in trunk)
and added also a test case(your C# code) to avoid such loses.

DIGY




> SegmentTermVector IndexOf method always fails
> ---------------------------------------------
>
>                 Key: LUCENENET-183
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-183
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Franklin Simmons
>         Attachments: SegmentTermVector-2.patch, SegmentTermVector.patch
>
>
> At index time term vectors are sorted using String.CompareOrdinal. However 
> method IndexOf of class SegmentTermVector invokes System.Array.BinarySearch, 
> which is using String.Compare.
> {noformat}public virtual int IndexOf(System.String termText)
> {
>       if (terms == null)
>               return - 1;
>     int res = System.Array.BinarySearch(terms, termText);
>       return res >= 0 ? res : - 1;
> }
> {noformat}
> The effect is that the IndexOf method always returns a negative number (no 
> match) because the sort order is incompatible with the default comparer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-183) SegmentTermVector IndexOf method always fails

Reply via email to