Re: scorers and filters

2007-08-10 Thread Paul Elschot
indicated below is an extract superclass refactoring. And the latest discussion on it is on java-dev today. Regards, Paul Elschot. Skip based on the filter and the query... See the comments in FilteredQuery, and see https://issues.apache.org/jira/browse/LUCENE-584 -Yonik

Re: What is the contrib/surround/src/java purpose

2007-08-09 Thread Paul Elschot
://svn.apache.org/viewvc/lucene/java/trunk/contrib/surround/surround.txt?view=log Groeten, Paul Elschot On Wednesday 08 August 2007 12:20, Ard Schrijvers wrote: Hello, without having to dive into the code, I was hoping somebody could tell me what this contrib block does? I can't seem to find any

Re: You are right but it doesn't make it faster.

2007-08-06 Thread Paul Elschot
the index in any other way while doing this, that is, do no query searches and no updates. A bit of bookkeeping per term it will make it straightforward to compute the total document frequencies. Regards, Paul Elschot On Monday 06 August 2007 13:12, tierecke wrote: Thanks Daniel, you are completely

Re: Complex proximity clauses within Lucene QueryParser

2007-08-05 Thread Paul Elschot
to providing all of lucene in a single query language. Regards, Paul Elschot - Mark tierecke wrote: Hi, I got stuck with a complex proximity clause - and would be grateful to get your help. Does Lucene allow, and if yes: what is the syntax? * Proximity between two phrases

Re: Can I do boosting based on term postions?

2007-08-03 Thread Paul Elschot
a bit surprised that SpanFirstQuery does not work that way now. Regards, Paul Elschot Cedric, I am sending you the implementation of SpanTermQuery to your gmail account (lucene mailing list is bouncing email with attachment). I have named the class as VSpanTermQuery (I have followed

Re: WildcardQuery and SpanQuery

2007-07-18 Thread Paul Elschot
to limit the maximum number of expanded terms in another way than Lucene does. In surround the classes BasicQueryFactory and TooManyBasicQueries are used for that. Regards, Paul Elschot Cedric - To unsubscribe, e-mail

Re: WildcardQuery and SpanQuery

2007-07-18 Thread Paul Elschot
the o.a.l.queryParser.surround.query package. The code posted by Mark Miller may solve your problem, too. Regards, Paul Elschot On 7/18/07, Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 18 July 2007 05:58, Cedric Ho wrote: Hi everybody, We recently need to support wildcard search terms

Re: Payloads and PhraseQuery

2007-07-12 Thread Paul Elschot
for the payloads, there many be more than one for a single Span. Regards, Paul Elschot Cheers, Grant On Jul 12, 2007, at 8:20 AM, Peter Keegan wrote: I'm looking for Spans.getPositions(), as shown in BoostingTermQuery, but neither NearSpansOrdered nor NearSpansUnordered (which

Re: product based term combination for BooleanQuery?

2007-07-09 Thread Paul Elschot
I don't know whether this was mentioned here before, but an easy way to get product based term combination is by using a logarithm based term score. The addition of the logarithms will result in the logarithm of the product, which is sorted in the same order as the product itself. Regards, Paul

Re: Highlighter that works with phrase and span queries

2007-06-27 Thread Paul Elschot
not looked at any highlighting code yet. Is there already an extension of PhraseQuery that has getSpans() ? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: negative queries

2007-06-15 Thread Paul Elschot
this: +foo:bar -goobly:doo That should do what you want, as far as I can see. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Optional terms in BooleanQuery

2007-05-22 Thread Paul Elschot
It introduces Matcher as a superclass of Scorer. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: One (large) field shared by many documents

2007-05-20 Thread Paul Elschot
. A collection text looks more like a sum text than like a relational attribute, so treating it as a separate lucene doc (lucene entity) feels just about right. Regards, Paul Elschot regards, Peter Erick Erickson wrote: You're right, your index will bloat considerably. In fact, I'm

Re: One (large) field shared by many documents

2007-05-20 Thread Paul Elschot
of medium sized documents. I guess the only way to find out how bad the performance will be, is to implement it. A FieldCache will retrieve the necessary field values only once, and therefore you can avoid retrieving many documents yourself. Regards, Paul Elschot. regards, Peter Paul

Re: Field.Store.Compress - does it improve performance of document reads?

2007-05-18 Thread Paul Elschot
Otis, See below. On Friday 18 May 2007 05:03, Otis Gospodnetic wrote: - Original Message From: Paul Elschot [EMAIL PROTECTED] On Thursday 17 May 2007 08:10, Andreas Guther wrote: I am currently exploring how to solve performance problems I encounter with Lucene document reads

Re: Field.Store.Compress - does it improve performance of document reads?

2007-05-17 Thread Paul Elschot
reducing the costs of disk head seeks even more. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Possible bug in SpanNearQuery

2007-05-07 Thread Paul Elschot
require renaming the class to start with Test... instead of ending in ...Test. Shall we move further discussion to the java-dev list? Regards, Paul Elschot On Monday 07 May 2007 09:44, Moti Nisenson wrote: Paul, The comment should be moved up into SpanNearQuery itself (as opposed

Re: Possible bug in SpanNearQuery

2007-05-06 Thread Paul Elschot
, Paul Elschot On Sunday 06 May 2007 16:11, Moti Nisenson wrote: Looking over the implementation of SpanNearQuery I came upon what looked like a bug. Below is a test which fails due to it. SpanNearQuery doesn't return all matching spans; once it's found a span it always increments the span

Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-05-01 Thread Paul Elschot
the names of the client fields into the term value, and use a single special field for these. When you do that, you'll also have to move the client field names in the queries from the field name to the term. This can easily be done by overriding one of the methods in QueryParser. Regards, Paul

Re: I have a question about phrase query with stop words

2007-04-13 Thread Paul Elschot
Token.setPositionIncrement(). Iirc you need to make a variation on StopFilter for this. Regards, Paul Elschot Also, when I try to highlight after searching for a phrase, the highlighter highlights individual words wherever it finds them in the input text. The documentation suggests

Re: Design Problem: Searching large set of protected documents

2007-04-04 Thread Paul Elschot
this must (should) work. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Design Problem: Searching large set of protected documents

2007-04-03 Thread Paul Elschot
. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Paul Elschot
a Filter for the positive Query, and then invert the bits in the BitSet of the Filter to have negative filtering. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: alternative scoring algorithm for PhraseQuery

2007-03-07 Thread Paul Elschot
On Wednesday 07 March 2007 18:12, Philipp Nanz wrote: Thanks for your answers. Your input is really appreciated :-) @Paul Elschot: Thanks for the hint. I guess I could use coord() to penalize missing terms like this: Query: a b c d Doc A: a b c d = sloppyFreq(0) * coord(4, 4) = 1 Doc B

Re: alternative scoring algorithm for PhraseQuery

2007-03-06 Thread Paul Elschot
the sloppiness for any possible term order. Span queries to that by just taking the distance between the first and last matched term. Would your implementation also generalize to a SpanQuery? Regards, Paul Elschot - To unsubscribe

Re: optimizing single document searches

2007-02-28 Thread Paul Elschot
into rewriting the search method of the indexarsearcher? Currently I just check hits.size(). For a single document: get the Scorer from the Query via Weight. Then check the return value of Scorer.next(), it will indicate whether the only doc matches the query. Regards, Paul Elschot. Russ Sent

Re: An arguable bug in Lucene 1.9.1

2007-02-06 Thread Paul Elschot
Gentlemen, Have a look here: https://issues.apache.org:443/jira/browse/LUCENE-413 This was fixed in 2.0. Regards, Paul Elschot On Tuesday 06 February 2007 01:38, [EMAIL PROTECTED] wrote: I am seeing this issue as well with the exact same stack trace using spanQueries. Does anyone know

Re: Long Query Performance

2007-01-22 Thread Paul Elschot
. Regards, Paul Elschot P.S. Instead of setUseScorer14(true) you might try this patch, which should be just as quick: http://issues.apache.org/jira/browse/LUCENE-730 . On Monday 22 January 2007 15:36, mark harwood wrote: MoreLikeThis.java is in the contrib section of SVN and this will help

Re: Technology Preview of new Lucene QueryParser

2007-01-22 Thread Paul Elschot
of brackets in the prefix form is tempting. Thanks for spelling this out, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Counting hits in a document

2007-01-19 Thread Paul Elschot
Adding a few details: On Friday 19 January 2007 06:42, Chris Hostetter wrote: SpanQuery whatever = ... Spans s = whatever.getSpans(indexReader) if (!s.skipTo(yourDocId)) { ... // no match } else { while (s.doc() == yourDocId) { print(match between +

Re: toomanyclauses exception

2006-12-27 Thread Paul Elschot
this, possibly event* or sale*. Since they seem to be specific enough, increasing the maximum number of boolean clauses that can be added to a boolean query appears to be the good way to fix this, see BooleanQuery.setMaxClauseCount(). Regards, Paul Elschot

Re: toomanyclauses exception

2006-12-27 Thread Paul Elschot
, and without the limit Query.rewrite() will run out of memory occasionaly. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene scoring: coord_q_d factor

2006-12-13 Thread Paul Elschot
28, 11-21, 1972 http://www.soi.city.ac.uk/~ser/idfpapers/ksj_orig.pdf The paper is the first one on the idf page: http://www.soi.city.ac.uk/~ser/idf.html Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Hits length with no sorting or scoring

2006-11-27 Thread Paul Elschot
(untested): // s is the IndexSearcher, query the Query org.apache.lucene.search.Scorer scorer = query.weight(s).scorer(s.getIndexReader()); int count = 0; while (scorer.next()) count++; Regards, Paul Elschot - To unsubscribe, e

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-27 Thread Paul Elschot
attempt at restoring the old query performance here: http://issues.apache.org/jira/browse/LUCENE-730 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Paul Elschot
the 1.4 BooleanScorer, but one never knows. Regards, Paul Elschot On Tuesday 21 November 2006 17:59, Yonik Seeley wrote: On 11/21/06, Stanislav Jordanov [EMAIL PROTECTED] wrote: Switch to the old scorer (via BooleanQuery.setUseScorer14(true) ) solved the performance issue - now Lucene 1.9.1

Re: State in IndexSearcher that is not in IndexReader?

2006-11-07 Thread Paul Elschot
be just as good as reusing IndexSearcher, performance-wise, wouldn't it? Currently IndexSearcher is indeed lightweight. But one can use IndexSearcher.getIndexReader() and reuse an IndexSearcher anyway. Regards, Paul Elschot

Re: simple (?) question about scoring

2006-11-03 Thread Paul Elschot
, you might consider to move this function into Lucene completely, because this will allow you to avoid using a filter alltogether. Regards, Paul Elschot Thanks On 11/3/06, Chris Hostetter [EMAIL PROTECTED] wrote: : le list is not ordered (I do not know the details of the search

Re: number of term occurrences

2006-10-23 Thread Paul Elschot
to know what Lucene DocId you care about here.). Now TermDocs.next() will increment through and you can count. Something like. TermDocs td = IndexReader.termDocs(); td.seek(new Term(field, value)); td.skipto(docId); and then td.freq() should give the answer without counting. Regards, Paul

Re: QueryParser Is Badly Broken

2006-10-15 Thread Paul Elschot
Mark, you wrote: On another note...http://famestalker.com ... http://famestalker.com/devwiki/ Could you explain how Paragraph/Sentence Proximity Searching is implemented in Qsol? Regards, Paul Elschot - To unsubscribe

Re: QueryParser Is Badly Broken

2006-10-13 Thread Paul Elschot
cheesy. For a query building UI it might be better to output queries in XML form to a Lucene server, see contrib/xml-query-parser . Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: wildcard and span queries

2006-10-11 Thread Paul Elschot
. A Filter for the single book would also work, but using skipTo() yourself on the spans is easier. Regards, Paul Elschot On Oct 11, 2006, at 2:17 PM, Erick Erickson wrote: Problem 3482: I'm probably close to being able to start work. Except... How to count hits with SrndQuery

Re: wildcard and span queries

2006-10-09 Thread Paul Elschot
try nesting like this: 20d( 4w(lucene, action), 5d(hatch*, gospod*)) ? Could you tell a bit more about the target grammar? Regards, Paul Elschot Thanks again... Erick On 10/6/06, Paul Elschot [EMAIL PROTECTED] wrote: Mark, On Friday 06 October 2006 22:46, Mark Miller

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
, given enough RAM. It shouldn't be too difficult to add NOT queries within WITHIN, there already is a SpanNotQuery in Lucene to map onto. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
, it's built into the BasicQueryFactory class. It is used at the bottom of the surround code to generate no more than a maximum of Lucene TermQuery's and SpanTermQuery's. The top of the surround code is its parser. Regards, Paul Elschot Thanks again Erick On 10/6/06, Paul Elschot [EMAIL

Re: wildcard and span queries

2006-10-06 Thread Paul Elschot
are anded so this should dramatically reduce the matches? The limitation in BasicQueryFactory works for a complete surround query, which can be nested. In Lucene only the max nr of clauses for a single level BooleanQuery can be controlled. ... Regards, Paul Elschot - Mark Erick Erickson wrote

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot
On Tuesday 19 September 2006 11:49, karl wettin wrote: On 9/19/06, Bhavin Pandya [EMAIL PROTECTED] wrote: Hi all, How to put limit in lucene that dont return me any document which has score less than 0.25 You implement a HitCollector and break out when you reach such low score. A

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot
Sorry, I sent the message before completing it. On Tuesday 19 September 2006 19:45, Paul Elschot wrote: On Tuesday 19 September 2006 11:49, karl wettin wrote: On 9/19/06, Bhavin Pandya [EMAIL PROTECTED] wrote: Hi all, How to put limit in lucene that dont return me any document which

Re: Lopsided scores for each term in BooleanQuery

2006-09-18 Thread Paul Elschot
me in the right direction as to how I would implement this? It's already there in DefaultSimilarity.tf() which is the square root: (sqrt(1) + sqrt(1)) (sqrt(0) + sqrt(2)) Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: Storing no. of occurances of a token

2006-09-13 Thread Paul Elschot
containing the term, and to the number of times the term occurs in each document. The total number of term occurrences over all indexed documents is not present a Lucene index. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: removing a term from a lucene index

2006-09-13 Thread Paul Elschot
to delete all docs containing a term. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Filter inside SpanQuery

2006-09-05 Thread Paul Elschot
.) Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
into Lucene. Did you also implement parsed phrases with Lucene's PhraseQuery? Surround does not have that. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
order of operations, but also obviously can create some pretty large queries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
this, but this requires an implementation in which phrase queries treat slop in the same way as span queries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: wildcards and spans

2006-08-02 Thread Paul Elschot
of per BooleanQuery. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Span Query NLE

2006-07-30 Thread Paul Elschot
On Tuesday 25 July 2006 03:26, Charlie wrote: ... can surround be nested 3w(4n(a?a AND bb?) AND cc+) Yes, but iirc the arguments need to be separated by comma's: 3w( 4n( ... , ...) , ...) instead of by AND. Regards, Paul Elschot

Re: QueryTerms vs. IndexTerms

2006-07-03 Thread Paul Elschot
(IndexReader). Here a prefixed query term, for example prefer*, is replaced by terms in the index, for example by prefer, preferable, prefered, preference. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Any existing query types that support equivalent of -not interested ?

2006-07-01 Thread Paul Elschot
...) ) ) How about sth like this to get rid of the duplicates in there: SpanNotNearQuery(includeSpanQuery, excludeSpanQuery, distance, ordered) ? Writing the SpanScorer for that would be some work, though. Regards, Paul Elschot

Re: how Boolean query work internally in lucene

2006-07-01 Thread Paul Elschot
/lucene/java/trunk/src/java/org/apache/lucene/search/ Advancing to a document number at or after a given document number is done in the skipTo() methods. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Any existing query types that support equivalent of -not interested ?

2006-06-30 Thread Paul Elschot
(not,interested)) with a SpanTermQuery for each term? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching is taking a lot...

2006-06-29 Thread Paul Elschot
things this way avoids the disk head moving up and down between different parts of the index during the collection. Also, make sure to call searcher.doc(docNr) with sorted docNrs, i.e. there is normally no need to change the order of the collected docNrs. Regards, Paul Elschot

Re: Searching is taking a lot...

2006-06-27 Thread Paul Elschot
time only first time. If you fire the same query again it takes very very less. Can anybody tell me the story behibd this. That is most likely your operating system's disk cache. Regards, Paul Elschot. - To unsubscribe, e-mail

Re: Searching repeating fields

2006-06-23 Thread Paul Elschot
the repeating fields and use a search a single field by requiring that the separator does not occur in the match, roughly like: SpanNotQuery(SpanNearQuery(term1, term2), separatorterm) where each term is a SpanTermQuery in the Lucene field, not the same as the repeating field above. Regards, Paul

Re: Modifying the stored norm type

2006-06-21 Thread Paul Elschot
On Wednesday 21 June 2006 12:13, karl wettin wrote: On Tue, 2006-06-20 at 18:01 +0200, Paul Elschot wrote: On Tuesday 20 June 2006 12:02, Marcus Falck wrote: encodeNorm method of the Similarity class will encode my boost value into a single byte decimal number. And I will loose a lot

Re: Modifying the stored norm type

2006-06-21 Thread Paul Elschot
On Tuesday 20 June 2006 18:42, Dan Climan wrote: Paul Elschot [EMAIL PROTECTED] On Tuesday 20 June 2006 12:02, Marcus Falck wrote: After a lot of debugging and some API doc reading I have come to the conclusion that the static encodeNorm method of the Similarity class will encode my boost

Re: Document clustering using lucene

2006-06-15 Thread Paul Elschot
an element of a similarity matrix from two term vectors. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: fastest way to get raw hit count

2006-05-29 Thread Paul Elschot
this can return the count faster then the hits.length() Untested: Scorer scorer = query.weight(indexSearcher).scorer(indexSearcher.getIndexReader()); int docCount = 0; while (scorer.next()) docCount++; Regards, Paul Elschot

Re: BufferedIndexInput.readByte performance; Spans not unique

2006-05-27 Thread Paul Elschot
will be ordered, so in case they are not unique, the ones with the same begin/end positions in the same doc will be consecutive. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: BufferedIndexInput.readByte performance

2006-05-26 Thread Paul Elschot
://issues.apache.org/jira/browse/LUCENE-365 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: BufferedIndexInput.readByte performance; skipping

2006-05-26 Thread Paul Elschot
TermDocs.skipTo(int). Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, less acceleration and more accelerable cases. More detailed experiments would be useful here. Regards, Paul Elschot

Re: Boolean query term match count

2006-05-25 Thread Paul Elschot
in the calculation of the score value for the document. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Matching at least N terms of subqueries

2006-05-21 Thread Paul Elschot
hard it would be. Another solution is to use a boolean query with a minimum number of matching clauses, where the Nth clause has a form like this: SpanNearQuery(commonTerm, termN). Regards, Paul Elschot - To unsubscribe, e-mail

Re: Scoring without floating point calculations

2006-05-09 Thread Paul Elschot
a floating point bottleneck during a query search? Regards, Paul Elschot

Invoking luke when No sub-file with id ... found

2006-05-07 Thread Paul Elschot
allows luke to normally work on such an index: java -cp lucene-core-1.9-rc1-dev.jar:lukeall.jar org.getopt.luke.Luke Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tips on building a better BooleanQuery

2006-05-05 Thread Paul Elschot
to implement a Scorer for such a query. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: using boolean operators with the PhraseQuery

2006-04-24 Thread Paul Elschot
and BooleanScorer2. Boolean queries do not have Spans. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene for searching tokens, not storing them.

2006-04-16 Thread Paul Elschot
On Sunday 16 April 2006 19:18, karl wettin wrote: 15 apr 2006 kl. 21.32 skrev Paul Elschot: implements TermPositions { public int nextPosition() throws IOException { This enumerates all positions of the Term in the document as returned by the Tokenizer used by the Analyzer

Re: Lucene Seaches VS. Relational database Queries

2006-04-15 Thread Paul Elschot
). In both fields one could use an extra word from a relational db, for example a client id. Regards, Paul Elschot View this message in context: http://www.nabble.com/Lucene-Seaches-VS.-Relational-database-Queries-t1434583.html#a3925693 Sent from the Lucene - Java Users forum at Nabble.com

Re: Catching BooleanQuery.TooManyClauses

2006-04-15 Thread Paul Elschot
, so I don't know whether that would work for you. In other words, one can use the above BitSet in a Filter lateron during an IndexSearcher.search() (or in a ConstantScoreQuery), and use Hits or TopDocs for document retrieval. Regards, Paul Elschot

Re: Why is BooleanQuery.maxClauseCount static?

2006-04-15 Thread Paul Elschot
of clauses should be associated with the top level query only. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene for searching tokens, not storing them.

2006-04-15 Thread Paul Elschot
of each document per field? So what does the byte represent then? What is stored is a byte representing the inverse of the number of indexed terms in a field of a document, as returned by a Tokenizer. Regards, Paul Elschot

Re: Lucene Seaches VS. Relational database Queries

2006-04-14 Thread Paul Elschot
the total seek time. When the filters start taking too much RAM, have a look here: http://issues.apache.org/jira/browse/LUCENE-328 Regards, Paul Elschot Ananth On 4/13/06, Erick Erickson [EMAIL PROTECTED] wrote: On 4/13/06, Ananth T. Sarathy [EMAIL PROTECTED] wrote: No we do have drop

Re: solution: RangeQuery with floating point numbers

2006-04-09 Thread Paul Elschot
at the time... Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

2006-03-29 Thread Paul Elschot
, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

2006-03-28 Thread Paul Elschot
issue. However, even with that, the problem persisted on the previous occasion, so the source of the problem seems to be somewhere else. This is also why a test index would be most welcome. Regards, Paul Elschot - To unsubscribe, e

Re: Joins between index and database

2006-03-23 Thread Paul Elschot
://issues.apache.org/jira/browse/LUCENE-328 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: 100,000 indexes and what to do

2006-03-11 Thread Paul Elschot
, you can put these in a separate index, and write your own MultiSearcher that filters only on the customer index. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Compressed BitSet

2006-03-09 Thread Paul Elschot
On Thursday 09 March 2006 14:25, eks dev wrote: ... PS1: If you are interested in compressed bit sets, try to search for utilities for compact sparse filters lucene Or look here: http://issues.apache.org/jira/browse/LUCENE-328 Regards, Paul Elschot

Re: Get only count

2006-03-08 Thread Paul Elschot
value, but the score value is not tested. Most Scorers give only positive score values for matching documents. This is implemented in the IndexSearcher.search(...) and Scorer.score(HitCollector) methods. Regards, Paul Elschot -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED

Re: Lucene version 1.9

2006-03-07 Thread Paul Elschot
. Wath's wrong? Iirc this was fixed in 1.9.1: http://lucene.apache.org/java/docs/index.html Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
to resolve it? Which compiler do you use? My guess would be gcj. The indicated line is in an initialisation block for an anonymous inline subclass, and gcj's support for such constructs was not complete the last time I tried. Regards, Paul Elschot

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
) and I can't think of anything else that might cause the error msg. Btw, this message title mentions SpanTermQuery but below it sais SpanOrQuery. Regards, Paul Elschot Paul Elschot wrote: On Tuesday 07 March 2006 15:35, Murat Yakici wrote: Hi, I was building the Lucene 1.9.1 source

Re: Phrase query vs span query

2006-02-22 Thread Paul Elschot
help to get an impression of how to match in the ordered and unordered cases. It might be possible to generalize the various span algorithms there and in the trunk to work with fewer terms. Regards, Paul Elschot - To unsubscribe, e

Re: Performance and FS block size

2006-02-12 Thread Paul Elschot
size. For an educated guess, I'd say that 4k/4k gives better performance than smaller file system block sizes and 8k/4k is not likely to have much of an effect either way. Does any of this sound right? I recall Paul Elschot talking about disk reads and disk arm movement, and Robert Engels

Re: Too many required clauses for a BooleanQuery

2006-02-09 Thread Paul Elschot
give it a try. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Too many required clauses for a BooleanQuery

2006-02-09 Thread Paul Elschot
. One more thing: in case these queries are generated, you might consider building the corresponding (nested) BooleanQuery yourself instead of using the QueryParser. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: How can you simulate inOrder in boolean queries

2006-02-07 Thread Paul Elschot
/viewcvs.cgi/lucene/java/tags/lucene_1_4_3/src/test/org/apache/lucene/search/ And there is also another query language that has ordered and unordered queries: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/ Regards, Paul Elschot

Re: Getting the document number (with IndexReader)

2006-01-27 Thread Paul Elschot
I call IndexReader.close()? On the same IndexReader, yes. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Paul Elschot
, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

<    1   2   3   >