RE: How do you properly use NumericField

2009-10-12 Thread Uwe Schindler
The source code attachment got somehow lost:

 

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.NumericField;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.Term;

import org.apache.lucene.store.RAMDirectory;

import org.apache.lucene.search.*;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.util.LuceneTestCase;

import org.apache.lucene.util.NumericUtils;

 

public class TestNRQWithQueryParser extends LuceneTestCase {

 

  public void test() throws Exception {

 

RAMDirectory directory = new RAMDirectory();

IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);



for (int l=-5000; l<=5000; l++) {

  Document doc = new Document();

  doc.add(new Field("text", "the big brown", Field.Store.NO,
Field.Index.ANALYZED));

  doc.add(new NumericField("trie", Field.Store.NO,
true).setIntValue(l));

  writer.addDocument(doc);

}  

writer.close();



Searcher searcher=new IndexSearcher(directory, true);



QueryParser parser = new QueryParser("text", new WhitespaceAnalyzer()) {



  @Override

  protected Query newRangeQuery(String field, String part1, String
part2, boolean inclusive) {

if ("trie".equals(field)) {

  return NumericRangeQuery.newIntRange(field,
Integer.parseInt(part1), Integer.parseInt(part2), inclusive, inclusive);

} else {

  return super.newRangeQuery(field, part1, part2, inclusive);

}

  }

 

  @Override

  protected Query newTermQuery(Term term) {

if("trie".equals(term.field())) {

  return new TermQuery(new Term(term.field(),
NumericUtils.intToPrefixCoded(Integer.parseInt(term.text();

} else {

  return super.newTermQuery(term);

}

  }

  

};



TopDocs td;

td = searcher.search(parser.parse("+trie:[20 TO 30]"), 5000);

assertEquals(11, td.totalHits);

td = searcher.search(parser.parse("+trie:[-4999 TO -4000]"), 5000);

assertEquals(1000, td.totalHits);

td = searcher.search(parser.parse("the big brown +trie:[-4999 TO
-4000]"), 5000);

assertEquals(1000, td.totalHits);

td = searcher.search(parser.parse("+trie:77"), 5000);

assertEquals(1, td.totalHits);

td = searcher.search(parser.parse("+trie:5001"), 5000);

assertEquals(0, td.totalHits);

td = searcher.search(parser.parse("the big brown +trie:\"-2\""), 5000);

assertEquals(1, td.totalHits);

td = searcher.search(parser.parse("+trie:\"-5001\""), 5000);

assertEquals(0, td.totalHits);

 

searcher.close();

directory.close();

  }

  

}

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: u...@thetaphi.de

 

 

> -Original Message-----

> From: Uwe Schindler [mailto:u...@thetaphi.de]

> Sent: Monday, October 12, 2009 8:10 PM

> To: java-user@lucene.apache.org

> Subject: RE: How do you properly use NumericField

> 

> Hallo Paul,

> 

> I implemented what you wanted in the applied testcase. Works without

> problems. Your error was, that in the TermQuery creation you placed a

> precisionStep in the shift value parameter which is incorrect.

> 

> By the way: Lucene 2.9.1 and Lucene 3.0 will be optimized for ranges like

> [1

> TO 1], because this is now as fast as a TermQuery, but you can

> NumericRangeQuery for it (and do not need to encode the terms). Just

> replace

> the TermQuery with NumericUtils in the newTermQuery method by a

> NumericRangeQuery with upper and lower bound equal (and not exclusive).

> 

> Please note: negative numbers in the query parser may lead to problems,

> because of this they needed to be placed in "" ("-" is the sign for

> exclusion terms). The test may fail with other Analyzers that corrupt your

> numbers.

> 

> Uwe

> 

> > -Original Message-

> > From: Uwe Schindler [mailto:u...@thetaphi.de]

> > Sent: Monday, October 12, 2009 5:49 PM

> > To: java-user@lucene.apache.org; paul_t...@fastmail.fm

> > Subject: RE: How do you properly use NumericField

> >

> > Can you print the upper and lower term or the term you received in

> > newRangeQuery and newTermQuery also to System.out? Maybe it is converted

> > somehow by your Analyzer, that is used for parsing the query.

> >

> > -

> > Uwe Schindler

> > H.-H.-Meier-Allee 63, D-28213 Bremen

> > http://www.thetaphi.de

> > eMail: u...@thetaphi.d

RE: How do you properly use NumericField

2009-10-12 Thread Uwe Schindler
Hallo Paul,

I implemented what you wanted in the applied testcase. Works without
problems. Your error was, that in the TermQuery creation you placed a
precisionStep in the shift value parameter which is incorrect.

By the way: Lucene 2.9.1 and Lucene 3.0 will be optimized for ranges like [1
TO 1], because this is now as fast as a TermQuery, but you can
NumericRangeQuery for it (and do not need to encode the terms). Just replace
the TermQuery with NumericUtils in the newTermQuery method by a
NumericRangeQuery with upper and lower bound equal (and not exclusive).

Please note: negative numbers in the query parser may lead to problems,
because of this they needed to be placed in "" ("-" is the sign for
exclusion terms). The test may fail with other Analyzers that corrupt your
numbers.

Uwe

> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Monday, October 12, 2009 5:49 PM
> To: java-user@lucene.apache.org; paul_t...@fastmail.fm
> Subject: RE: How do you properly use NumericField
> 
> Can you print the upper and lower term or the term you received in
> newRangeQuery and newTermQuery also to System.out? Maybe it is converted
> somehow by your Analyzer, that is used for parsing the query.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -Original Message-
> > From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> > Sent: Monday, October 12, 2009 1:00 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: How do you properly use NumericField
> >
> > Uwe Schindler wrote:
> > > I forgot: The format of numeric fields is also not plain text, because
> > of
> > > this a simple TermQuery as generated by your query parser will not
> work,
> > > too.
> > >
> > > If you want to hit numeric values without a NumericRangeQuery with
> lower
> > and
> > > upper bound equal, you have to use NumericUtils to translate the term
> > text,
> > > e.g. new TermQuery(new Term("field",
> > > NumericUtils.intToPrefixCoded(value,precstep)))
> > >
> > > If you want support for this in QueryParser, you have to override
> > > QueryParser.newTermQuery as explained before for newRangeQuery. By the
> > way,
> > > Solr does this in exactly that way.
> > >
> > > Uwe
> > >
> >
> > Ok, Im trying my best here but still cannot get range or single term
> > query searching to work.
> >
> > package org.musicbrainz.search.servlet;
> >
> > import junit.framework.TestCase;
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.NumericField;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.index.Term;
> > import org.apache.lucene.queryParser.QueryParser;
> > import org.apache.lucene.search.*;
> > import org.apache.lucene.store.RAMDirectory;
> > import org.apache.lucene.util.NumericUtils;
> > import org.musicbrainz.search.index.TrackAnalyzer;
> >
> > public class NumericFieldTest extends TestCase {
> >
> > public void testNumericFields() throws Exception {
> > Analyzer analyzer = new TrackAnalyzer();
> > RAMDirectory dir = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(dir, analyzer, true,
> > IndexWriter.MaxFieldLength.LIMITED);
> > Document doc = new Document();
> > NumericField nf  = new NumericField("dur");
> > nf.setIntValue(123);
> > writer.addDocument(doc);
> > writer.close();
> >
> > IndexSearcher searcher = new IndexSearcher(dir,true);
> > {
> >
> > Query q = new
> > MusicbrainzQueryParser("dur",analyzer).parse("[12 TO 124]");
> > assertEquals(1, searcher.search(q,10).totalHits);
> >
> >
> > q = new MusicbrainzQueryParser("dur",analyzer).parse("123");
> > assertEquals(1, searcher.search(q,10).totalHits);
> >
> >
> > }
> > }
> >
> > static class MusicbrainzQueryParser extends QueryParser {
> >
> > public MusicbrainzQueryParser(String field, Analyzer a) {
> > super(field, a);
> > System.out.println("init parser");
> > }
> >
> > public Query newRangeQuery(String field,
> >String part1,
>

Re: How do you properly use NumericField

2009-10-12 Thread Paul Taylor

Uwe Schindler wrote:

Can you print the upper and lower term or the term you received in
newRangeQuery and newTermQuery also to System.out? Maybe it is converted
somehow by your Analyzer, that is used for parsing the query.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

  
Its okay I've given up on NumericField itself now but just use 
NumericUtils which isn't giving me the performance advantages but is 
working okay for what I want to do


So when indexing I do
doc.add(new Field(field.getName(), NumericUtils.intToPrefixCoded(val), 
Field.Store.YES, Field.Index.ANALYZED);



and my Query Parser looks like this:

public class MusicbrainzQueryParser extends QueryParser {

   public MusicbrainzQueryParser(String field, Analyzer a) {
   super(field, a);
   }

   protected Query newTermQuery(Term term) {
   if (
   (term.field() == TrackIndexField.DURATION.getName()) ||
   (term.field() == 
TrackIndexField.QUANTIZED_DURATION.getName()) ||
   (term.field() == 
TrackIndexField.TRACKNUM.getName()) ||
   (term.field() == 
TrackIndexField.NUM_TRACKS.getName())

   ) {
   TermQuery tq = new TermQuery(new Term(term.field(), 
NumericUtils.intToPrefixCoded(Integer.parseInt(term.text();

   return tq;
   } else {
   return super.newTermQuery(term);
   }
   }

   public Query newRangeQuery(String field,
  String part1,
  String part2,
  boolean inclusive) {

   if (
   (field.equals(TrackIndexField.DURATION.getName())) ||
   
(field.equals(TrackIndexField.QUANTIZED_DURATION.getName())) ||

   (field.equals(TrackIndexField.TRACKNUM.getName())) ||
   (field.equals(TrackIndexField.NUM_TRACKS.getName()))
   )
   {
   part1 = NumericUtils.intToPrefixCoded(Integer.parseInt(part1));
   part2 = NumericUtils.intToPrefixCoded(Integer.parseInt(part2));
   }
   TermRangeQuery query = (TermRangeQuery)
   super.newRangeQuery(field, part1, part2,inclusive);
   return query;
   }
}

and to display the value I use

String duration = doc.get(TrackIndexField.DURATION);
   if (duration != null) {
   
System.out.prinltn(BigInteger.valueOf(NumericUtils.prefixCodedToInt(duration)));

   }


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: How do you properly use NumericField

2009-10-12 Thread Uwe Schindler
Can you print the upper and lower term or the term you received in
newRangeQuery and newTermQuery also to System.out? Maybe it is converted
somehow by your Analyzer, that is used for parsing the query.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Monday, October 12, 2009 1:00 PM
> To: java-user@lucene.apache.org
> Subject: Re: How do you properly use NumericField
> 
> Uwe Schindler wrote:
> > I forgot: The format of numeric fields is also not plain text, because
> of
> > this a simple TermQuery as generated by your query parser will not work,
> > too.
> >
> > If you want to hit numeric values without a NumericRangeQuery with lower
> and
> > upper bound equal, you have to use NumericUtils to translate the term
> text,
> > e.g. new TermQuery(new Term("field",
> > NumericUtils.intToPrefixCoded(value,precstep)))
> >
> > If you want support for this in QueryParser, you have to override
> > QueryParser.newTermQuery as explained before for newRangeQuery. By the
> way,
> > Solr does this in exactly that way.
> >
> > Uwe
> >
> 
> Ok, Im trying my best here but still cannot get range or single term
> query searching to work.
> 
> package org.musicbrainz.search.servlet;
> 
> import junit.framework.TestCase;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.search.*;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.util.NumericUtils;
> import org.musicbrainz.search.index.TrackAnalyzer;
> 
> public class NumericFieldTest extends TestCase {
> 
> public void testNumericFields() throws Exception {
> Analyzer analyzer = new TrackAnalyzer();
> RAMDirectory dir = new RAMDirectory();
> IndexWriter writer = new IndexWriter(dir, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
> Document doc = new Document();
> NumericField nf  = new NumericField("dur");
> nf.setIntValue(123);
> writer.addDocument(doc);
> writer.close();
> 
> IndexSearcher searcher = new IndexSearcher(dir,true);
> {
> 
> Query q = new
> MusicbrainzQueryParser("dur",analyzer).parse("[12 TO 124]");
> assertEquals(1, searcher.search(q,10).totalHits);
> 
> 
> q = new MusicbrainzQueryParser("dur",analyzer).parse("123");
> assertEquals(1, searcher.search(q,10).totalHits);
> 
> 
> }
> }
> 
> static class MusicbrainzQueryParser extends QueryParser {
> 
> public MusicbrainzQueryParser(String field, Analyzer a) {
> super(field, a);
> System.out.println("init parser");
> }
> 
> public Query newRangeQuery(String field,
>String part1,
>String part2,
>boolean inclusive)
>  {
> System.out.println("RangeQuery");
> TermRangeQuery query = (TermRangeQuery)
> super.newRangeQuery(field, part1, part2,
> inclusive);
> 
> if ("dur".equals(field)) {
> System.out.println("durRangeQuery");
> 
> return NumericRangeQuery.newIntRange(
> "dur",
> Integer.parseInt(query.getLowerTerm()),
> Integer.parseInt(query.getUpperTerm()),
> query.includesLower(),
> query.includesUpper());
> } else {
> return query;
> }
> }
> 
> protected Query newTermQuery(Term term)
> {
> System.out.println("newTermQuery");
> if(term.field().equals("dur")) {
>System.out.println("dur,newTermQuery");
>TermQuery tq =  new TermQuery(new Term("field",
> 
> NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()),NumericUtils.P
> RECISION_STEP_DEFAULT)));
>return tq;
> }
> else {
> return super.newTermQuery(term);
> }
> }
> }
> 
> }
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-12 Thread Paul Taylor

Uwe Schindler wrote:

I forgot: The format of numeric fields is also not plain text, because of
this a simple TermQuery as generated by your query parser will not work,
too.

If you want to hit numeric values without a NumericRangeQuery with lower and
upper bound equal, you have to use NumericUtils to translate the term text,
e.g. new TermQuery(new Term("field",
NumericUtils.intToPrefixCoded(value,precstep)))

If you want support for this in QueryParser, you have to override
QueryParser.newTermQuery as explained before for newRangeQuery. By the way,
Solr does this in exactly that way.

Uwe
  


Ok, Im trying my best here but still cannot get range or single term 
query searching to work.


package org.musicbrainz.search.servlet;

import junit.framework.TestCase;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.NumericUtils;
import org.musicbrainz.search.index.TrackAnalyzer;

public class NumericFieldTest extends TestCase {

   public void testNumericFields() throws Exception {
   Analyzer analyzer = new TrackAnalyzer();
   RAMDirectory dir = new RAMDirectory();
   IndexWriter writer = new IndexWriter(dir, analyzer, true, 
IndexWriter.MaxFieldLength.LIMITED);

   Document doc = new Document();
   NumericField nf  = new NumericField("dur");
   nf.setIntValue(123);
   writer.addDocument(doc);
   writer.close();

   IndexSearcher searcher = new IndexSearcher(dir,true);
   {

   Query q = new 
MusicbrainzQueryParser("dur",analyzer).parse("[12 TO 124]");

   assertEquals(1, searcher.search(q,10).totalHits);


   q = new MusicbrainzQueryParser("dur",analyzer).parse("123");
   assertEquals(1, searcher.search(q,10).totalHits);


   }
   }

   static class MusicbrainzQueryParser extends QueryParser {

   public MusicbrainzQueryParser(String field, Analyzer a) {
   super(field, a);
   System.out.println("init parser");
   }

   public Query newRangeQuery(String field,
  String part1,
  String part2,
  boolean inclusive)
{
   System.out.println("RangeQuery");
   TermRangeQuery query = (TermRangeQuery)
   super.newRangeQuery(field, part1, part2,
   inclusive);

   if ("dur".equals(field)) {
   System.out.println("durRangeQuery");

   return NumericRangeQuery.newIntRange(
   "dur",
   Integer.parseInt(query.getLowerTerm()),
   Integer.parseInt(query.getUpperTerm()),
   query.includesLower(),
   query.includesUpper());
   } else {
   return query;
   }
   }

   protected Query newTermQuery(Term term)
   {
   System.out.println("newTermQuery");
   if(term.field().equals("dur")) {
  System.out.println("dur,newTermQuery");
  TermQuery tq =  new TermQuery(new Term("field",
  
NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()),NumericUtils.PRECISION_STEP_DEFAULT)));

  return tq;
   }
   else {
   return super.newTermQuery(term);
   }
   }
   }

}

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-11 Thread Paul Taylor

Uwe Schindler wrote:

As we told you before. The default QueryParser has no support fro
NumericField (as it doesn't know the schema). To get it running, subclass it
and overwrite newRangeQuery method to create a NumericRangeQuery for field
names that are indexed using NumericField.
  
Hi, yes I did this but it never called the getRangeQuery and from 
talking to MM it seemed it would only be used for duration queries, 
heres a full test which still fails


package org.musicbrainz.search.servlet;

import junit.framework.TestCase;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.queryParser.ParseException;

public class NumericFieldTest extends TestCase {

   public void testNumericFields() throws Exception {
   Analyzer analyzer = new StandardAnalyzer();
   RAMDirectory dir = new RAMDirectory();
   IndexWriter writer = new IndexWriter(dir, analyzer, true, 
IndexWriter.MaxFieldLength.LIMITED);

   Document doc = new Document();
   NumericField nf  = new NumericField("dur");
   nf.setIntValue(123);
   doc.add(nf);
   doc.add(new Field("dur", "789", 
Field.Store.NO,Field.Index.ANALYZED ));

   writer.addDocument(doc);
   writer.close();

   IndexSearcher searcher = new IndexSearcher(dir,true);
   {
   Query q = new 
MusicbrainzQueryParser("dur",analyzer).parse("789");

   assertEquals(1, searcher.search(q,10).totalHits);

   q = new MuiscbrainzQueryParser("dur",analyzer).parse("123");
   assertEquals(1, searcher.search(q,10).totalHits);


   }
   }

   static class MusicbrainzQueryParser extends QueryParser {

   public MusicbrainzQueryParser(String field, Analyzer a) {
   super(field, a);
   System.out.println("init parser");
   }

   public Query getRangeQuery(String field,
  String part1,
  String part2,
  boolean inclusive)
   throws ParseException {
   System.out.println("RangeQuery");
   TermRangeQuery query = (TermRangeQuery)
   super.getRangeQuery(field, part1, part2,
   inclusive);

   if ("dur".equals(field)) {
   System.out.println("dur");

   return NumericRangeQuery.newIntRange(
   "dur",
   Integer.parseInt(query.getLowerTerm()),
   Integer.parseInt(query.getUpperTerm()),
   query.includesLower(),
   query.includesUpper());
   } else {
   return query;

   }

   }




The recommended way is to instantiate the NumericQueries directly and not
via a query parser. You can combina a text query with query parser then
together with various numeric ranges using a BooleanQuery on top of it. If
you really need a query string range representation for numeric values,
there is no way around extending QueryParser.
  
The UI allows the user to enter a lucene query using any valid syntax as 
required so will always need QueryParser which I have extended as you 
can see from above



thanks Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: How do you properly use NumericField

2009-10-11 Thread Uwe Schindler
I forgot: The format of numeric fields is also not plain text, because of
this a simple TermQuery as generated by your query parser will not work,
too.

If you want to hit numeric values without a NumericRangeQuery with lower and
upper bound equal, you have to use NumericUtils to translate the term text,
e.g. new TermQuery(new Term("field",
NumericUtils.intToPrefixCoded(value,precstep)))

If you want support for this in QueryParser, you have to override
QueryParser.newTermQuery as explained before for newRangeQuery. By the way,
Solr does this in exactly that way.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Sunday, October 11, 2009 8:08 PM
> To: java-user@lucene.apache.org; paul_t...@fastmail.fm
> Subject: RE: How do you properly use NumericField
> 
> As we told you before. The default QueryParser has no support fro
> NumericField (as it doesn't know the schema). To get it running, subclass
> it
> and overwrite newRangeQuery method to create a NumericRangeQuery for field
> names that are indexed using NumericField.
> 
> The recommended way is to instantiate the NumericQueries directly and not
> via a query parser. You can combina a text query with query parser then
> together with various numeric ranges using a BooleanQuery on top of it. If
> you really need a query string range representation for numeric values,
> there is no way around extending QueryParser.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -Original Message-
> > From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> > Sent: Sunday, October 11, 2009 7:58 PM
> > Cc: Michael McCandless; Lucene Users
> > Subject: Re: How do you properly use NumericField
> >
> > Michael McCandless wrote:
> > > On the indexing side you do this:
> > >
> > > doc.add(new NumericField("price").setDoubleValue(19.99));
> > >
> > > The NumericField is not stored by default (there's also a ctor to
> > > specify Store.YES or Store.NO).
> > >
> > > If the numeric field is not being used in a range query, how is it
> > > being used?  EG for sorting, it will just work.  If you did store the
> > > field, when you retrieve it, it will come back as a normal field with
> > > a String value (equal to the .toString of original numeric value).
> > >
> > > (You can play with precisionStep, to trade off disk space &
> > > performance; especially if you will do range querying and eg only
> > > sorting, you should set precisionStep=Integer.MAX_VALUE; but these are
> > > advanced optimizations).
> > >
> > > Mike
> > >
> > Hmm, Im being dense here but even a simple non range search doesn't seem
> > to work when using Numeric Fields, in the test below it matches 789 okay
> > but not 123
> >
> > Paul
> >
> > package org.musicbrainz.search.analysis;
> >
> > import junit.framework.TestCase;
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.store.RAMDirectory;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.document.Document;
> > import org.apache.lucene.document.NumericField;
> > import org.apache.lucene.document.Field;
> > import org.apache.lucene.search.IndexSearcher;
> > import org.apache.lucene.search.Query;
> > import org.apache.lucene.queryParser.QueryParser;
> >
> > public class NumericFieldTest extends TestCase {
> >
> > public void testNumericFields() throws Exception {
> > Analyzer analyzer = new StandardAnalyzer();
> > RAMDirectory dir = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(dir, analyzer, true,
> > IndexWriter.MaxFieldLength.LIMITED);
> > Document doc = new Document();
> > NumericField nf  = new NumericField("dur");
> > nf.setIntValue(123);
> > doc.add(nf);
> > doc.add(new Field("dur", "789",
> > Field.Store.NO,Field.Index.ANALYZED ));
> > writer.addDocument(doc);
> > writer.close();
> >
> > IndexSearcher searcher = new IndexSearcher(dir,true);
> > {
> > Query q = new QueryParser("dur",analyzer).parse("789");
> > assertEquals(1, searcher.search(q,10).totalHits);
&g

RE: How do you properly use NumericField

2009-10-11 Thread Uwe Schindler
As we told you before. The default QueryParser has no support fro
NumericField (as it doesn't know the schema). To get it running, subclass it
and overwrite newRangeQuery method to create a NumericRangeQuery for field
names that are indexed using NumericField.

The recommended way is to instantiate the NumericQueries directly and not
via a query parser. You can combina a text query with query parser then
together with various numeric ranges using a BooleanQuery on top of it. If
you really need a query string range representation for numeric values,
there is no way around extending QueryParser.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Sunday, October 11, 2009 7:58 PM
> Cc: Michael McCandless; Lucene Users
> Subject: Re: How do you properly use NumericField
> 
> Michael McCandless wrote:
> > On the indexing side you do this:
> >
> > doc.add(new NumericField("price").setDoubleValue(19.99));
> >
> > The NumericField is not stored by default (there's also a ctor to
> > specify Store.YES or Store.NO).
> >
> > If the numeric field is not being used in a range query, how is it
> > being used?  EG for sorting, it will just work.  If you did store the
> > field, when you retrieve it, it will come back as a normal field with
> > a String value (equal to the .toString of original numeric value).
> >
> > (You can play with precisionStep, to trade off disk space &
> > performance; especially if you will do range querying and eg only
> > sorting, you should set precisionStep=Integer.MAX_VALUE; but these are
> > advanced optimizations).
> >
> > Mike
> >
> Hmm, Im being dense here but even a simple non range search doesn't seem
> to work when using Numeric Fields, in the test below it matches 789 okay
> but not 123
> 
> Paul
> 
> package org.musicbrainz.search.analysis;
> 
> import junit.framework.TestCase;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> 
> public class NumericFieldTest extends TestCase {
> 
> public void testNumericFields() throws Exception {
> Analyzer analyzer = new StandardAnalyzer();
> RAMDirectory dir = new RAMDirectory();
> IndexWriter writer = new IndexWriter(dir, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
> Document doc = new Document();
> NumericField nf  = new NumericField("dur");
> nf.setIntValue(123);
> doc.add(nf);
> doc.add(new Field("dur", "789",
> Field.Store.NO,Field.Index.ANALYZED ));
> writer.addDocument(doc);
> writer.close();
> 
> IndexSearcher searcher = new IndexSearcher(dir,true);
> {
> Query q = new QueryParser("dur",analyzer).parse("789");
> assertEquals(1, searcher.search(q,10).totalHits);
> 
> q = new QueryParser("dur",analyzer).parse("123");
> assertEquals(1, searcher.search(q,10).totalHits);
> 
> 
> }
> }
> 
> 
> }
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-11 Thread Paul Taylor

Michael McCandless wrote:

On the indexing side you do this:

doc.add(new NumericField("price").setDoubleValue(19.99));

The NumericField is not stored by default (there's also a ctor to
specify Store.YES or Store.NO).

If the numeric field is not being used in a range query, how is it
being used?  EG for sorting, it will just work.  If you did store the
field, when you retrieve it, it will come back as a normal field with
a String value (equal to the .toString of original numeric value).

(You can play with precisionStep, to trade off disk space &
performance; especially if you will do range querying and eg only
sorting, you should set precisionStep=Integer.MAX_VALUE; but these are
advanced optimizations).

Mike
  
Hmm, Im being dense here but even a simple non range search doesn't seem 
to work when using Numeric Fields, in the test below it matches 789 okay 
but not 123


Paul

package org.musicbrainz.search.analysis;

import junit.framework.TestCase;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.queryParser.QueryParser;

public class NumericFieldTest extends TestCase {

   public void testNumericFields() throws Exception {
   Analyzer analyzer = new StandardAnalyzer();
   RAMDirectory dir = new RAMDirectory();
   IndexWriter writer = new IndexWriter(dir, analyzer, true, 
IndexWriter.MaxFieldLength.LIMITED);

   Document doc = new Document();
   NumericField nf  = new NumericField("dur");
   nf.setIntValue(123);
   doc.add(nf);
   doc.add(new Field("dur", "789", 
Field.Store.NO,Field.Index.ANALYZED ));

   writer.addDocument(doc);
   writer.close();

   IndexSearcher searcher = new IndexSearcher(dir,true);
   {
   Query q = new QueryParser("dur",analyzer).parse("789");
   assertEquals(1, searcher.search(q,10).totalHits);

   q = new QueryParser("dur",analyzer).parse("123");
   assertEquals(1, searcher.search(q,10).totalHits);


   }
   }


}


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-09 Thread Paul Taylor

Michael McCandless wrote:

On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor  wrote
  
It still relies on super.getRangeQuery() for non-numeric fields.  If

you don't have non-numeric fields that accept range queries you can
simply call NumericRangeQuery.newXXXRange directly.


  
For some indexes I have to use MultiFieldQueryParser, I cant see how I 
can specify for this to use my modified QueryParser rather then the 
default QueryParser, is this possible ?


thanks Paul


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-09 Thread Paul Taylor

Michael McCandless wrote:

On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor  wrote:

  

I currently use NumberTools.longToString() to add integer fields to
an index and allow range searching, then when searching I then
preprocess the query (using regular expressions) and convert integer
fields to NumberTools.longToString before it is parsed by the
QueryParser, then when I return the results I use
NumberTools.stringToLong(), so my implementation is flaky. Now I'm
using Lucene 2.9 I thought Id use NumericField and hoped I could
remove the preprocessing instead but I'm really not clear what I do
on the indexing and searching side. I've even just bought the MEAP
version of Lucene Action 2nd Edition and it doesn't even get a
mention (nor does NumberTools for that matter it just mentions
padding numbers with zeroes).

So please anyone got a simple example of how to add a numeric field
to an index, and what has to be done on the search side, assuming
receiving a text string that gets parsed by the QueryParser



The next LIA2 MEAP update, which should be out very soon, covers
NumericField and also shows how to extend QueryParser (by subclassing
and overriding newRangeQuery) to properly create a NumericRangeQuery,
like this:

  static class NumericRangeQueryParser extends QueryParser {
public NumericRangeQueryParser(String field, Analyzer a) {
  super(field, a);
}
public Query getRangeQuery(String field,
   String part1,
   String part2,
   boolean inclusive)
throws ParseException {
  TermRangeQuery query = (TermRangeQuery)
super.getRangeQuery(field, part1, part2,
  inclusive);
  if ("price".equals(field)) {
return NumericRangeQuery.newDoubleRange(
  "price",
  Double.parseDouble(
   query.getLowerTerm()),
  Double.parseDouble(
   query.getUpperTerm()),
  query.includesLower(),
  query.includesUpper());
  } else {
return query;
  }
}
  }

It still relies on super.getRangeQuery() for non-numeric fields.  If
you don't have non-numeric fields that accept range queries you can
simply call NumericRangeQuery.newXXXRange directly.

Mike

  
Ok, thanks but what do i do on the indexing side I dont understand this 
TriePart thing, and if the numeric field is not being used in a range 
query do I have to worry about that ?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: How do you properly use NumericField

2009-10-09 Thread Uwe Schindler
Hi Paul,

for creating NumericFields just refer to the JavaDoc. As Mike said on the
query side you can create NumericRangeQuery directly (recommended) - see
javadocs. If you want to use QueryParser, you have to customize it, as
QueryParser does not support NumericRangeQuery natively.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, October 09, 2009 10:49 PM
> To: java-user@lucene.apache.org; paul_t...@fastmail.fm
> Subject: Re: How do you properly use NumericField
> 
> On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor  wrote:
> 
> > I currently use NumberTools.longToString() to add integer fields to
> > an index and allow range searching, then when searching I then
> > preprocess the query (using regular expressions) and convert integer
> > fields to NumberTools.longToString before it is parsed by the
> > QueryParser, then when I return the results I use
> > NumberTools.stringToLong(), so my implementation is flaky. Now I'm
> > using Lucene 2.9 I thought Id use NumericField and hoped I could
> > remove the preprocessing instead but I'm really not clear what I do
> > on the indexing and searching side. I've even just bought the MEAP
> > version of Lucene Action 2nd Edition and it doesn't even get a
> > mention (nor does NumberTools for that matter it just mentions
> > padding numbers with zeroes).
> >
> > So please anyone got a simple example of how to add a numeric field
> > to an index, and what has to be done on the search side, assuming
> > receiving a text string that gets parsed by the QueryParser
> 
> The next LIA2 MEAP update, which should be out very soon, covers
> NumericField and also shows how to extend QueryParser (by subclassing
> and overriding newRangeQuery) to properly create a NumericRangeQuery,
> like this:
> 
>   static class NumericRangeQueryParser extends QueryParser {
> public NumericRangeQueryParser(String field, Analyzer a) {
>   super(field, a);
> }
> public Query getRangeQuery(String field,
>String part1,
>String part2,
>boolean inclusive)
> throws ParseException {
>   TermRangeQuery query = (TermRangeQuery)
> super.getRangeQuery(field, part1, part2,
>   inclusive);
>   if ("price".equals(field)) {
> return NumericRangeQuery.newDoubleRange(
>   "price",
>   Double.parseDouble(
>query.getLowerTerm()),
>   Double.parseDouble(
>query.getUpperTerm()),
>   query.includesLower(),
>   query.includesUpper());
>   } else {
> return query;
>   }
> }
>   }
> 
> It still relies on super.getRangeQuery() for non-numeric fields.  If
> you don't have non-numeric fields that accept range queries you can
> simply call NumericRangeQuery.newXXXRange directly.
> 
> Mike
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: How do you properly use NumericField

2009-10-09 Thread Michael McCandless
On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor  wrote:

> I currently use NumberTools.longToString() to add integer fields to
> an index and allow range searching, then when searching I then
> preprocess the query (using regular expressions) and convert integer
> fields to NumberTools.longToString before it is parsed by the
> QueryParser, then when I return the results I use
> NumberTools.stringToLong(), so my implementation is flaky. Now I'm
> using Lucene 2.9 I thought Id use NumericField and hoped I could
> remove the preprocessing instead but I'm really not clear what I do
> on the indexing and searching side. I've even just bought the MEAP
> version of Lucene Action 2nd Edition and it doesn't even get a
> mention (nor does NumberTools for that matter it just mentions
> padding numbers with zeroes).
>
> So please anyone got a simple example of how to add a numeric field
> to an index, and what has to be done on the search side, assuming
> receiving a text string that gets parsed by the QueryParser

The next LIA2 MEAP update, which should be out very soon, covers
NumericField and also shows how to extend QueryParser (by subclassing
and overriding newRangeQuery) to properly create a NumericRangeQuery,
like this:

  static class NumericRangeQueryParser extends QueryParser {
public NumericRangeQueryParser(String field, Analyzer a) {
  super(field, a);
}
public Query getRangeQuery(String field,
   String part1,
   String part2,
   boolean inclusive)
throws ParseException {
  TermRangeQuery query = (TermRangeQuery)
super.getRangeQuery(field, part1, part2,
  inclusive);
  if ("price".equals(field)) {
return NumericRangeQuery.newDoubleRange(
  "price",
  Double.parseDouble(
   query.getLowerTerm()),
  Double.parseDouble(
   query.getUpperTerm()),
  query.includesLower(),
  query.includesUpper());
  } else {
return query;
  }
}
  }

It still relies on super.getRangeQuery() for non-numeric fields.  If
you don't have non-numeric fields that accept range queries you can
simply call NumericRangeQuery.newXXXRange directly.

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org