Re: Problem with MultiPhrase Query in Lucene 4.3

Ian Lea Thu, 03 Oct 2013 08:51:02 -0700

Below is a little self-contained test program.  You may recognise some
of the code.


Here's the output from a couple of runs using lucene 4.4.0.

$ java ian.G1 "Dremel is a scalable, interactive ad-hoc query system"
"interactive ad-hoc"
term=interactive
term=ad-hoc
+content:"interactive" +content:"ad-hoc": totalHits=1


$ java ian.G1 "Dremel is a scalable, interactive ad-hoc query system"
"interactive adhoc"
term=interactive
+content:"interactive": totalHits=1

All looks OK to me.  Maybe you can make it fail, or use it to help fix
your problem.

--
Ian.

package ian;

import java.util.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.core.*;
import org.apache.lucene.analysis.en.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.document.*;
import org.apache.lucene.queries.*;
import org.apache.lucene.search.*;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.util.*;

public class G1 {

    void test(String _contents, String _words) throws Exception {
String contents = _contents;
String words = _words;

  RAMDirectory dir = new RAMDirectory();
Analyzer anl = new WhitespaceAnalyzer(Version.LUCENE_44);
IndexWriterConfig iwcfg = new IndexWriterConfig(Version.LUCENE_44,
anl);
IndexWriter iw = new IndexWriter(dir, iwcfg);

FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
Field field = new Field("content", "", offsetsType);
Document doc = new Document();
doc.add(field);
field.setStringValue(contents);
iw.addDocument(doc);
iw.close();

IndexReader rdr = DirectoryReader.open(dir);
Fields fields = MultiFields.getFields(rdr);
Terms terms = fields.terms("content");

BooleanQuery bq = new BooleanQuery();
String[] worda = _words.split(" ");
for (String w : worda) {
   LinkedList<Term> termsWithPrefix = new LinkedList<Term>();
   TermsEnum trm = terms.iterator(null);
   trm.seekCeil(new BytesRef(w));
   do {
String s = trm.term().utf8ToString();
if (s.startsWith(w)) {
   termsWithPrefix.add(new Term("content", s));
   System.out.printf("term=%s\n", s);
}
else {
   break;
}
   }
   while (trm.next() != null);

   if (!termsWithPrefix.isEmpty()) {
MultiPhraseQuery mpquery = new MultiPhraseQuery();
mpquery.add(termsWithPrefix.toArray(new Term[0]));
bq.add(mpquery, BooleanClause.Occur.MUST);
   }
}

IndexSearcher searcher = new IndexSearcher(rdr);
TopDocs results = searcher.search(bq, 10);
System.out.printf("%s: totalHits=%s\n",
 bq, results.totalHits);
    }



    public static void main(String[] _args) throws Exception {
G1 t = new G1();
t.test(_args[0], _args[1]);
    }
}


On Thu, Oct 3, 2013 at 4:10 PM, VIGNESH S <vigneshkln...@gmail.com> wrote:
> Hi,
>
> sorry.. thats my typo..
>
> Its not failing because of that
>
>
> On Thu, Oct 3, 2013 at 8:17 PM, Ian Lea <ian....@gmail.com> wrote:
>
>> Are you sure it's not failing because "adhoc" != "ad-hoc"?
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Oct 3, 2013 at 3:07 PM, VIGNESH S <vigneshkln...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am Trying to do Multiphrase Query in Lucene 4.3. It is working Perfect
>> > for all scenarios except the below scenario.
>> > When I try to Search for a phrase which is preceded by any punctuation,it
>> > is not working..
>> >
>> > TextContent:  Dremel is a scalable, interactive ad-hoc query system for
>> > analysis
>> > of read-only nested data. By combining multi-level execution
>> > trees and columnar data layout, it is capable of running aggregation
>> >
>> > Search phrase :  interactive adhoc
>> >
>> > The Above Search is failing because "interactive adhoc" is preceded by
>> ","
>> > in original text.
>> >
>> >
>> > I am Doing Indexing like this..Sample Code for Indexing.I have used
>> > whitespace analyzer.
>> >
>> > Document doc = new Document();
>> >
>> > contents ="Dremel is a scalable, interactive ad-hoc query system for
>> > analysis
>> > of read-only nested data. By combining multi-level execution
>> > trees and columnar data layout, it is capable of running aggregation";
>> >
>> > FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
>> >
>> > Field field =new Field("content","", offsetsType);
>> >
>> > doc.add(field);
>> > field.setStringValue(contents);
>> >
>> > mWriter.addDocument(doc);
>> >
>> > In the Search I am forming MultiphraseQueryObject and adding the tokens
>> of
>> > the search Phrase.
>> >
>> > Before Adding the tokens,I validated like this
>> >
>> > LinkedList<Term> termsWithPrefix = new LinkedList<Term>();
>> trm.seekCeil(new
>> > BytesRef(word)); do { String s = trm.term().utf8ToString(); if
>> > (s.startsWith(word)) { termsWithPrefix.add(new Term("content", s)); }
>> else
>> > { break; } } while (trm.next() != null);
>> > mpquery.add(termsWithPrefix.toArray(new Term[0])); }
>> >
>> > It is working for all scenarios except the scenarios where the search
>> > phrase is preceded by punctuation.
>> >
>> > In case of text preceded by punctuation trm.seekCeil(new BytesRef(word));
>> > is pointing a diffrent word which actually causes the problem..
>> >
>> > Please kindly help..
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Vignesh Srinivasan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Thanks and Regards
> Vignesh Srinivasan
> 9739135640

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Problem with MultiPhrase Query in Lucene 4.3

Reply via email to