[
https://issues.apache.org/jira/browse/LUCENENET-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863890#action_12863890
]
Digy commented on LUCENENET-183:
--------------------------------
Below is the mail from *Bernie Solomon*
{code}
Having hit the same problem I am puzzled why the fix for LUCENENET-183 seems to
have got reverted. Lucene.NET
and lucene do not seem to be consistent as they currently are as the following
short test programs do different things.
Java correctly has 1 for index and C# incorrectly prints -1.
The proposed fix does address this. Am I missing something?
Thanks
Bernie
--- Java ---
import java.lang.*;
import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.*;
class Test
{
public static void main(String [] args)
{
try
{
RAMDirectory directory = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer,
IndexWriter.MaxFieldLength.LIMITED);
Document document = new Document();
document.add(new Field("contents", new StringReader("a_ a0"),
Field.TermVector.WITH_OFFSETS));
writer.addDocument(document);
IndexReader reader = writer.getReader();
TermPositionVector tpv =
(TermPositionVector)reader.getTermFreqVector(0, "contents");
System.out.println("tpv: " + tpv);
int index = tpv.indexOf("a_");
System.out.println("index: " + index);
}
catch (Exception ex)
{
}
}
}--- C# ---using System;
using System.IO;
using System.Text;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Store;
public class Test
{
public static void Main(string [] args)
{
RAMDirectory directory = new RAMDirectory();
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer,
IndexWriter.MaxFieldLength.LIMITED);
Document document = new Document();
document.Add(new Field("contents", new StreamReader(new
MemoryStream(Encoding.ASCII.GetBytes("a_ a0"))),
Field.TermVector.WITH_OFFSETS));
writer.AddDocument(document);
IndexReader reader = writer.GetReader();
TermPositionVector tpv = reader.GetTermFreqVector(0, "contents") as
TermPositionVector;
Console.WriteLine("tpv: " + tpv);
int index = tpv.IndexOf("a_");
Console.WriteLine("index: " + index);
}
}
{code}
Thanks Bernie,
This patch is lost while porting 2.9.0
I recommitted the patch. (for 2.9.1 & 2.9.2 in trunk)
and added also a test case(your C# code) to avoid such loses.
DIGY
> SegmentTermVector IndexOf method always fails
> ---------------------------------------------
>
> Key: LUCENENET-183
> URL: https://issues.apache.org/jira/browse/LUCENENET-183
> Project: Lucene.Net
> Issue Type: Bug
> Reporter: Franklin Simmons
> Attachments: SegmentTermVector-2.patch, SegmentTermVector.patch
>
>
> At index time term vectors are sorted using String.CompareOrdinal. However
> method IndexOf of class SegmentTermVector invokes System.Array.BinarySearch,
> which is using String.Compare.
> {noformat}public virtual int IndexOf(System.String termText)
> {
> if (terms == null)
> return - 1;
> int res = System.Array.BinarySearch(terms, termText);
> return res >= 0 ? res : - 1;
> }
> {noformat}
> The effect is that the IndexOf method always returns a negative number (no
> match) because the sort order is incompatible with the default comparer.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.