Re: SpanQuery for Terms at same position

2009-11-25 Thread Paul Elschot
construction. I think requiring n terms at the same position would need a slop of 1-n, and I'd like to have some test cases added for that. Now if I only had some time... Regards, Paul Elschot thanks, CT On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor ctig...@thinkmap.comwrote: yes

Re: SpanQuery for Terms at same position

2009-11-23 Thread Paul Elschot
arbitrary span searches where tokens may be at the same position and also in other positions where the ordering of subsequent terms may be restricted as per the normal span API. My pleasure, Paul Elschot thanks, CT On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot paul.elsc...@xs4all.nlwrote

Re: SpanQuery for Terms at same position

2009-11-23 Thread Paul Elschot
that the unordered case with a slop of -1 and without the edit works to match terms at the same position? In that case it may be worthwhile to add that to the javadocs, and also add a few testcases. Regards, Paul Elschot CT On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor ctig

Re: SpanQuery for Terms at same position

2009-11-22 Thread Paul Elschot
position. SpanNearQuery may or may not work for a slop of -1, but one could try that for both the ordered and unordered cases. One way to do that is to start from the existing test cases. Regards, Paul Elschot Regards, Adriano Crestani On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
Try a MultiTermQueryWrapperFilter instead of the QueryFilter. I'd expect a modest gain in performance. In case it is possible to form a few groups of terms that are reused, it could even be more efficient to also use a CachingWrapperFilter for each of these groups. Regards, Paul Elschot Op

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
? There are various ways. OpenBitSet and OpenBitSetDISI can do this, and there's also BooleanFilter and ChainedFilter in contrib. Using FieldCacheTermsFilter sounds promising. Fortunately it is a single value field (our unique doc id). Regards, Paul Elschot I'll consider very seriously moving

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Paul Elschot
will be compat with v3.2, but not v3.4) I'd prefer B), with a minimum period of about two months to the next release in case it removes deprecations. Regards, Paul Elschot

Re: faceted search performance

2009-10-13 Thread Paul Elschot
by using the ones with the best query score. Limiting the number of terms would also be good, but that less easy. Regards, Paul Elschot Chris 2009/10/12 Paul Elschot paul.elsc...@xs4all.nl Chris, You could also store term vectors for all docs at indexing time, and add the termvectors

Re: faceted search performance

2009-10-12 Thread Paul Elschot
. Regards, Paul Elschot

Re: faceted search performance

2009-10-12 Thread Paul Elschot
Chris, You could also store term vectors for all docs at indexing time, and add the termvectors for the matching docs into a (large) map of terms in RAM. Regards, Paul Elschot On Monday 12 October 2009 21:30:48 Christoph Boosz wrote: Hi Jake, Thanks for your helpful explanation. In fact

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
, could you try a toString() on the top level scorer for one of the affected queries to see whether it shows BS2 on top level and BS for the inner scorers? Regards, Paul Elschot BooleanQuery only uses BooleanScorer when there are no required terms, and allowDocsOutOfOrder is true. So I can't

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
As long as next(), skipTo(), doc() and score() on a Scorer work, the search will be done. I hope the results are correct in this case, but I'm not sure. Regards, Paul Elschot On Wednesday 15 July 2009 19:08:00 Michael McCandless wrote: I don't think a toplevel BS2 is able to use BS as sub

Re: Boolean retrieval

2009-07-04 Thread Paul Elschot
It is also possible to use the HitCollector api and simply ignore the score values. Regards, Paul Elschot On Saturday 04 July 2009 21:14:41 Mark Harwood wrote: Check out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery On 4 Jul 2009, at 17:37, Lukas

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
different weights in SpanTermQuery. Regards, Paul Elschot On Friday 17 April 2009 12:18:46 Radhalakshmi Sreedharan wrote: To make the question simple, What I need is the following : If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd). Given the following : I should

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
. As a side note, Will the Shingle Filter help me getting all possible combination of the input tokens? I don't know. Regards, Paul Elschot

Re: Index in text format

2009-04-09 Thread Paul Elschot
On Thursday 09 April 2009 21:56:44 Andy wrote: Is there a way to have lucene to write index in a txt file? No. You could try a hexdump of the index file(s), but that isn't really human readable. Instead of that you may want to try Luke: http://www.getopt.org/luke/ Regards, Paul Elschot

Re: Internals question: BooleanQuery with many TermQuery children

2009-04-07 Thread Paul Elschot
at most once per document field, so for these it normally helps to use a Filter. Regards, Paul Elschot

Re: Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread Paul Elschot
is located in. It's the other way around: for span queries a search result is created (internally, by SpanScorer) from the spans resulting from the getSpans() method above. Does that help? Regards, Paul Elschot All of the examples I find (in LIA and from CNLP) demonstrate

Re: number of hits of pages containing two terms

2009-03-17 Thread Paul Elschot
. Regards, Paul Elschot On Tuesday 17 March 2009 12:35:19 Adrian Dimulescu wrote: Ian Lea wrote: Adrian - have you looked any further into why your original two term query was too slow? My experience is that simple queries are usually extremely fast. Let me first point out

Re: Speeding up RangeQueries?

2009-03-14 Thread Paul Elschot
/SearchNumericalFields Regards, Paul Elschot

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-17 Thread Paul Elschot
as in the removed methods there, your original problem might not have occurred at all. In the CachingWrapperFilter in trunk the choice is left to an overridable method. Regards, Paul Elschot Regards, Raf On Sun, Feb 15, 2009 at 2:39 PM, Paul Elschot paul.elsc...@xs4all.nlwrote: Meanwhile

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-15 Thread Paul Elschot
when it is smaller than OpenBitSet), please comment at LUCENE-1296. Regards, Paul Elschot On Sunday 08 February 2009 09:47:24 Raffaella Ventaglio wrote: Hi Paul, One way to implement that would be to use one of the boolean combination filters in contrib, BooleanFilter or ChainedFilter

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-08 Thread Paul Elschot
describe how this compact forwarded index works? Similar to FieldCache idea but more compact. Does this also use FieldCacheRangeFilter and/or FieldCacheTermsFilter? Regards, Paul Elschot

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-08 Thread Paul Elschot
On Sunday 08 February 2009 09:53:00 Uwe Schindler wrote: I would do so, it's really simple, you can even do it in an anonymous inner class. It is indeed simple, but it might also help to take a look at the source code of the Lucene classes involved. Regards, Paul Elschot - UWE

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-07 Thread Paul Elschot
want to ask further on the java-dev list. Regards, Paul Elschot

Re: TermScorer default buffer size

2009-01-08 Thread Paul Elschot
John, Continuing, see below. On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: On Wednesday 07 January 2009 07:25:17 John Wang wrote: Hi: The default buffer size (for docid,score etc) is 32 in TermScorer. We have a large index with some terms to have very dense doc

Re: TermScorer default buffer size

2009-01-08 Thread Paul Elschot
the performance improvements? Regards, Paul Elschot -John On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot paul.elsc...@xs4all.nl wrote: John, Continuing, see below. On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: On Wednesday 07 January 2009 07:25:17 John Wang wrote: Hi

Re: TermScorer default buffer size

2009-01-07 Thread Paul Elschot
for the underlying TermDocs for very sparse doc sets. Regards, Paul Elschot

Re: Lucene retrieval model

2008-12-30 Thread Paul Elschot
model. Fuzzy searching is implemented by constructing a Boolean query with optional (and actually present) terms that are similar enough to the fuzzy query term. Regards, Paul Elschot - To unsubscribe, e-mail: java-user-unsubscr

Re: BooleanQuery Performance Help

2008-12-20 Thread Paul Elschot
it should not degrade performance much. Also, it will affect score values somewhat. Particularly, I am interested to know more about what further caching could be done apart from the default caching which lucene does. More caching is probably not going to help. Regards, Paul Elschot

Re: RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-18 Thread Paul Elschot
not sure why it made a difference in my case. That option chooses another algorithm to search these queries, it will only affect queries without required terms. (The change in search algorithm is from BooleanScorer2 to BooleanScorer.) Regards, Paul Elschot

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Paul Elschot
OpenBitSet. Also have a look at earlier discussions on the subject: you might find a good use for OpenBitSetDISI and contrib/**/{BooleanFilter,ChainedFilter}. Regards, Paul Elschot Op Tuesday 09 December 2008 07:44:20 schreef Michael Stoppelman: Hi all, I'm working on upgrading to Lucene 2.4.0

Re: 2.4 Performance

2008-11-19 Thread Paul Elschot
/jira/browse/LUCENE-1296 ? Also consider o.a.l.util.OpenBitSetDISI, and how that is used in contrib/queries/**/BooleanFilter Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Term numbering and range filtering

2008-11-19 Thread Paul Elschot
), but a contribution like this multi range filter makes it all worthwhile. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Term numbering and range filtering

2008-11-18 Thread Paul Elschot
the range boolean query. Mike, Paul, I'm happy to contribute this (ugly but working) code if there is interest. Let me know and I'll open a JIRA issue for it. In case you think more performance improvements based on this are possible... Regards, Paul Elschot

Re: Term numbering and range filtering

2008-11-11 Thread Paul Elschot
change to Lucene. The cheap version is hierarchical prefixing here: http://wiki.apache.org/jakarta-lucene/DateRangeQueries Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Term numbering and range filtering

2008-11-11 Thread Paul Elschot
such that loading at search time is very fast. Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410. I think what you're referring to is PDICT, which has frame exceptions for values that occur infrequently. Regards, Paul Elschot Mike Paul Elschot wrote: Op Tuesday 11 November 2008 11

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
Tim, I didn't follow all the details, so this may be somewhat off, but did you consider using TermVectors? Regards, Paul Elschot Op Monday 10 November 2008 19:18:38 schreef Tim Sturge: Yes, that is a significant issue. What I'm coming to realize is that either I will end up with something

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
LUCENE-1296 for caching another data structure than the one used to collect the filtered docs. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to combine filter in Lucene 2.4?

2008-11-09 Thread Paul Elschot
for a performance improvement will follow. Regards, Paul Elschot Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED

Re: How to combine filter in Lucene 2.4?

2008-11-08 Thread Paul Elschot
/queries/**/BooleanFilter Regards, Paul Elschot Op Saturday 08 November 2008 19:06:15 schreef Timo Nentwig: Hi! Since Filter.bits() is deprecated and replaced by getDocIdSet() now I wonder how I am supposed to combine (AND) filters (for facets). I worked around this issue by extending Filter

Re: Sorting posting lists before intersection

2008-10-13 Thread Paul Elschot
but for proximity queries it would be more of a guess. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Paul Elschot
Op Friday 05 September 2008 16:57:34 schreef Mark Miller: Paul Elschot wrote: Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: Sounds like its more in line with what you are looking for. If I remember correctly, the phrase query factors in the edit distance in scoring

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-04 Thread Paul Elschot
for scoring Spans. The reason why idf is not used could be that there is no basic score value associated with inner spans; only top level spans are scored by SpanScorer. For more details, please consult the SpanScorer code. Regards, Paul Elschot - Mark Yannis Pavlidis wrote: Hi, I am having

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Wednesday 03 September 2008 18:06:57 schreef Matt Ronge: On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote: Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge: On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote: Can you tell us a bit more about what you custom query does? Perhaps you can

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
in a custom scorer? In case you have a better way than skipTo(), or something to improve on this issue to allow a Filter as clause to BooleanQuery: https://issues.apache.org/jira/browse/LUCENE-1345 let us know. Regards, Paul Elschot

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
are looking for documents that contain partial phrases from an input query that has more than 2 words, have a look at Nutch. Regards, Paul Elschot -- Matt Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number of documents I

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: Hi all, I am working on implementing a new Query, Weight and Scorer that is expensive to run. I'd like to limit the number

Re: Fastest way to get just the bits of matching documents

2008-07-26 Thread Paul Elschot
Op Thursday 24 July 2008 23:00:33 schreef Robert Stewart: Queries are very complex in our case, some have up to 100 or more clauses (over several fields), including disjunctions and prohibited clauses. Other than the earlier advice, did you try setAllowDocsOutOfOrder() ? Regards, Paul

Re: Scoring filters

2008-06-11 Thread Paul Elschot
this as the scorer for a new Query, via a Weight. Once this new Query is available, just add it as required to a BooleanQuery. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

Re: SpanNearQuery: how to get the intra-span matching positions?

2008-05-30 Thread Paul Elschot
this discussion, please do so on java-dev. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanNearQuery scoring

2008-05-23 Thread Paul Elschot
does not contain a weight() or score() method, so there is no way to pass such information to SpanScorer. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi word synonyms

2008-05-18 Thread Paul Elschot
Op Sunday 18 May 2008 16:30:26 schreef Karl Wettin: 18 maj 2008 kl. 00.01 skrev Paul Elschot: Op Saturday 17 May 2008 20:28:40 schreef Karl Wettin: As far as I know Lucene only handle single word synonyms at index time. My life would be much simpler if it was possible to add synonyms

Re: MultiTerm Or Query with per-term boost. Does it exist?

2008-05-18 Thread Paul Elschot
significant performance improvements from doing this. Does BooleanQuery.setAllowDocsOutOfOrder() make a difference? Regards, Paul Elschot What is the general problem with your approach? And what does all these boosted term queries represent? Would it be perhaps be possible for you

Re: theoretical maximum score

2008-05-17 Thread Paul Elschot
values are combined into another value that has the same theoretical maximum. Have a look here to start: https://issues.apache.org/jira/browse/LUCENE-293 Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: multi word synonyms

2008-05-17 Thread Paul Elschot
positions are not affected, so at least there is no influence on scoring because of changes in the original token positions. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Filtering a SpanQuery

2008-05-07 Thread Paul Elschot
a BitSet available before the query search. it could speed things up quite a bit. I would expect a substantial speedup from using skipTo() on the Spans when only 0.1% of the results passes the filter. Regards, Paul Elschot Eran. On Wed, May 7, 2008 at 10:34 AM, Paul Elschot [EMAIL PROTECTED

Re: Filtering a SpanQuery

2008-05-06 Thread Paul Elschot
= spans.next(); } if (! more) { break; } filterDoc = bits.nextSetBit(spans.doc()); } Please check the javadocs of java.util.BitSet, there may be a 1 off error in the arguments to nextSetBit(). Regards, Paul Elschot I tried looking through the archives and found some reference

Re: Lucene Proximity Searches

2008-04-18 Thread Paul Elschot
in the org.apache.lucene.search.spans package to allow a match for less than all subqueries. This is not going to be straightforward, but it is possible. In case you choose this last option, please continue on the java-dev list. Regards, Paul Elschot On Fri, Apr 4, 2008 at 12:38 PM, Ana Rabade [EMAIL PROTECTED

Re: QueryWrapperFilter question...

2008-04-17 Thread Paul Elschot
convinced myself till the thought came to me at lunch :). For a single query, adding a filter off course has a cost. But when the location part can be reused in later queries, give CachingWrapperFilter a try. Regards, Paul Elschot -M On Wed, Apr 16, 2008 at 6:43 PM, Karl Wettin [EMAIL PROTECTED

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-12 Thread Paul Elschot
Op Saturday 12 April 2008 00:03:13 schreef Antony Bowesman: Paul Elschot wrote: Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: Use Filter and BitSet. From the personnal data, you build a Filter (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/ Fil ter.html

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Paul Elschot
. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-08 Thread Paul Elschot
regards, Paul Elschot Itamar. -Original Message- From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 08, 2008 1:56 AM To: java-user@lucene.apache.org Subject: Re: Why Lucene has to rewrite queries prior to actual searching? Itamar, Have a look here: http

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-07 Thread Paul Elschot
match a document, as long as at least one matches. For the required query parts (AND like), Scorer.skipTo() is used, and that could well be the filter mechanism you are referring to; have a look at the javadocs of Scorer, and, if necessary, at the actual code of ConjunctionScorer. Regards, Paul

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-07 Thread Paul Elschot
Itamar, Have a look here: http://lucene.apache.org/java/2_3_1/scoring.html Regards, Paul Elschot Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko: Paul and John, Thanks for your quick reply. The problem with query rewriting is the beforementioned MaxClauseException. Instead

Re: Improving Index Search Performance

2008-03-26 Thread Paul Elschot
data to the lucene index that can be used to reduce the number of results to be fetched. Regards, Paul Elschot Op Wednesday 26 March 2008 13:51:24 schreef Shailendra Mudgal: The bottom line is that reading fields from docs is expensive. FieldCache will, I believe, load fields for all

Re: Improving Index Search Performance

2008-03-25 Thread Paul Elschot
reason, retrieving docs is best done in doc id order, but that is unlikely to go wrong as doc ids are normally collected in increasing order. Regards, Paul Elschot Op Tuesday 25 March 2008 13:43:18 schreef Shailendra Mudgal: Hi Everyone, We are using Lucene to search on a index of around 20G

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
that, you'll probably want to use the PHP/Java extension to avoid initializing a JVM for each call to lucene. Try this: http://www.google.nl/search?q=php+java+org+apache+luceneie=UTF-8oe=UTF-8 This was one of the results: http://www.idimmu.net/index.php?blog%5Bpagenum%5D=3 Regards, Paul Elschot Op Friday

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
Op Saturday 22 March 2008 00:32:32 schreef Paul Elschot: Milu, This is a PHP problem, not a Lucene one, so you might get better response at a PHP mailing list. The easy way around your problem is probably by invoking a shell script from php that exports the class path as you indicated, so

Re: HELP: how to list term score inside some document?

2008-03-14 Thread Paul Elschot
at Searcher.explain() Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread Paul Elschot
, BooleanClause.Occur.MUST): There is no need for a QueryParser in this case when using a TermQuery instead of a Query for q1, q2, q3 and q4: TermQuery q1 = new TermQuery(new Term(title, term1)); Regards, Paul Elschot Donna Gresh JensBurkhardt [EMAIL PROTECTED] wrote on 02/29/2008 10:46:51 AM: Hey everybody

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-15 Thread Paul Elschot
requirement for that case? SpanNotQuery can be used to prevent matches over paragraph borders when these are indexed as such, but I would not expect that you would need those, given the fuzzyness of the [10/5/2]. Regards, Paul Elschot Op Friday 15 February 2008 09:45:58 schreef Cedric Ho: Hi Paul, Do

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
another field for different position info. Regards, Paul Elschot Op Thursday 14 February 2008 09:44:40 schreef Cedric Ho: Hi Paul, Sorry I am not sure I understand your solution. Because I would need to apply this scoring logic to all the different types of Queries. A search may consists

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
, one could index only the stem and use a payload for the actual inflected form (singular/plural, past/present, first/second/third person, etc). Regards, Paul Elschot Cedric On Fri, Feb 15, 2008 at 7:15 AM, Paul Elschot [EMAIL PROTECTED] wrote: I have no idea what the [10/5/2] means, so

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Paul Elschot
will probably need https://issues.apache.org/jira/browse/LUCENE-1093 . This will be somewhat slower than using a payload, because the search will be done in two separate fields, but it will work. Regards, Paul Elschot - To unsubscribe, e

Re: recall/precision with lucene

2008-02-09 Thread Paul Elschot
a precision/recall graph for the query by considering the total results higher than a given score. When a lot of such computations are needed, you may also want to cache the values of a unique identifier field for all indexed docs, have a look at FieldCache for this. Regards, Paul Elschot

Re: Lucene syntax query matched against a string content

2008-02-08 Thread Paul Elschot
. Regards, Paul Elschot Op Friday 08 February 2008 05:48:08 schreef Nilesh Bansal: Hi, I want to create a function, which takes in a query string (in lucene syntax), and a string as content and returns back if the query matches the content or not. This would mean, query = +(apache

Re: Lucene to index OCR text

2008-01-29 Thread Paul Elschot
Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll: On Friday 25 January 2008 19:26:44 Paul Elschot wrote: There is no way to do exact phrase matching on OCR data, because no correction of OCR data will be perfect. Otherwise the OCR would have made the correction... snip suggestion

Re: Lucene to index OCR text

2008-01-25 Thread Paul Elschot
also be a start for investigating. It all depends on how good the OCR was, but in some cases (think old paper) it's just not possible to do good OCR. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Lucene Performance

2008-01-19 Thread Paul Elschot
will be used during query search. The query rewrite could in principle do this, but it might affect the score values. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Self Join Query

2008-01-10 Thread Paul Elschot
to allow retrieval for filtering in another index. Retrieving stored fields is normally a performance bottleneck, so a FieldCache might be handy. Regards, Paul Elschot On Thursday 10 January 2008 12:58:44 sachin wrote: Here are more details about my issue. I have two tables in database. A row

Re: Query processing with Lucene

2008-01-09 Thread Paul Elschot
that offsets were meant to be positions within a document. It is also possible that offsets were meant in the sense of using skipTo(doc) instead of next() on a Scorer. This is done during query search when at least one term is required. Regards, Paul Elschot Doron On Jan 8, 2008 11:24 PM

Re: Can I do boosting based on term postions?

2007-12-18 Thread Paul Elschot
On Tuesday 18 December 2007 14:59:45 Peter Keegan wrote: Should I open a Jira issue? What shall I say? http://www.apache.org/foundation/how-it-works.html Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: Field weights

2007-12-14 Thread Paul Elschot
Karl, This might work for you: https://issues.apache.org/jira/browse/LUCENE-293 Regards, Paul Elschot On Friday 14 December 2007 18:06:01 Karl Wettin wrote: I have an index that contains three sorts of documents: Car brand Tire brand Tire pressure (Please bear with me, the real index

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Paul Elschot
Gentlefolk, Well, the javadocs as patched at LUCENE-584 try to change all the cases of zero scoring to 'non matching'. I'm happily bracing for a minor conflict with that patch. In case someone wants to take another look at the javadocs as patched there, don't let me stop you... Regards, Paul

Re: Search performance using BooleanQueries in BooleanQueries

2007-11-06 Thread Paul Elschot
On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote: On 29-Oct-07, at 9:43 AM, Paul Elschot wrote: On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote: +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e is much faster than (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Paul Elschot
, you'll run into the fact that the field norm (the inverse square root of the field length) is encoded in only 8 bits, which is rather course. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: Looking for Exact match but no other terms... how to express it?

2007-10-30 Thread Paul Elschot
= . Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search performance using BooleanQueries in BooleanQueries

2007-10-29 Thread Paul Elschot
. Regards, Paul Elschot thanks for any help, Regards Ard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Cache BitSet or doc number?

2007-10-27 Thread Paul Elschot
and SortedVIntList. Regards, Paul Elschot On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote: On 10/26/07, John Patterson [EMAIL PROTECTED] wrote: Thom Nelson wrote: Check out the HashDocSet from Solr, this is the best way to cache small sets of search results. In general, the Solr

Re: Adding support for NOT NEAR construct?

2007-10-17 Thread Paul Elschot
not work for this because it works on doc level and not within the matching text of a field. Regards, Paul Elschot On Wednesday 17 October 2007 17:57:21 Dave Golombek wrote: We've run into a situation where having NOT NEAR queries would really help. I haven't been able to find any discussion

Re: Scoring a single document from a corpus based on a given query

2007-10-10 Thread Paul Elschot
. You can try this: Explanation e = indexSearcher.explain(query, documentId); and get the score value from the explanation. Have a look at the code of any Scorer.explain() method on how to get the score value only. There really is no need to filter in this case. Regards, Paul Elschot

Re: Scorer skipTo() expectations?

2007-10-04 Thread Paul Elschot
. The reason for that is performance, BooleanScorer uses a faster data structure than a priority queue, but BooleanScorer does not implement skipTo(). Regards, Paul Elschot On Thursday 04 October 2007 09:12, Dan Rich wrote: Hi, I have a custom Query class that provides a long list of lucene

Re: a query for a special AND?

2007-10-01 Thread Paul Elschot
As for suggestions on how to do this, I have no other than to make sure that you can create the queries necessary to obtain the required output. Regards, Paul Elschot On Sunday 30 September 2007 09:20, Mohammad Norouzi wrote: Hi Paul, thanks, I dot your idea, now I am planing to implement

Re: Translating Lucene Query Syntax to Traditional Boolean Syntax

2007-09-25 Thread Paul Elschot
the TestBoolean* classes. Regards, Paul Elschot P.S. When documents may be scored out of order, for some disjunctions (OR), BooleanScorer is used for performance. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: a query for a special AND?

2007-09-20 Thread Paul Elschot
patient and service forms the relational result. However, for a text search engine it is usual to denormalize the relational records into indexed documents, depending on the required output. Regards, Paul Elschot On 9/20/07, Mohammad Norouzi [EMAIL PROTECTED] wrote: Hi Paul, would

Re: a query for a special AND?

2007-09-20 Thread Paul Elschot
a primary key, but even that you will need to program yourself. Regards, Paul Elschot thank you so much Paul On 9/20/07, Paul Elschot [EMAIL PROTECTED] wrote: On Thursday 20 September 2007 07:29, Mohammad Norouzi wrote: Sorry Paul I just hurried in replying ;) I read the documents

Re: a query for a special AND?

2007-09-17 Thread Paul Elschot
with two different value to clarify consider I have this query: field1:val* (field2:myValue1 XOR field2:myValue2) Did you try this: +field1:val* +field2:myValue1 +field2:myValue2 Regards, Paul Elschot now I want this result: field1 field2 val1myValue1 val1

Re: Span queries and complex scoring

2007-09-11 Thread Paul Elschot
in the trunk. Regards, Paul Elschot On Tuesday 11 September 2007 16:17, melix wrote: Hi, I'm working on an application which requires a complex scoring (based on semantics analysis). The scoring must be highly configurable, and I've found ways to do that, but I'm facing a discrete

  1   2   3   >