-Original Message- From: Peter Keegan
Sent: Thursday, November 6, 2014 3:21 PM
To: java-user
Subject: Exceptions during batch indexing
How are folks handling Solr exceptions that occur during batch indexing?
Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc
How are folks handling Solr exceptions that occur during batch indexing?
Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc
with a missing mandatory field), and stops indexing. The bad document is
not identified, so it would be hard for the client to recover by skipping
over
. Then I used per-field
codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but
I'm not feeling confident with this usage. Anyone can say more about
removing DirectSource API?
On 2013-3-26, at 22:59, Peter Keegan peterlkee...@gmail.com wrote:
Inspired by this presentation
Inspired by this presentation of DocValues:
http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
I decided to try them out in 4.2. I created a 1M document index with one
DocValues field:
BinaryDocValuesField conceptsDV = new
AveragPayloadFunction is just what it sounds like:
return numPayloadsSeen 0 ? (payloadScore / numPayloadsSeen) : 1;
What values are you seeing returned from PayloadHelper.decodeFloat ?
Peter
On Fri, Feb 3, 2012 at 4:13 AM, shyama shyamasree_s...@yahoo.com wrote:
Hi Peter
I have checked
All term queries, including payload queries, deal only with words from the
query that exist in a document. They don't know what other terms are in a
matching document, due to the inverted nature of the index.
Peter
On Fri, Feb 3, 2012 at 11:50 AM, shyama shyamasree_s...@yahoo.com wrote:
Hi
I don't quite follow what you're doing, but is it possible that your
payloads are not on the desired terms when you indexed them? The first
explanation shows that the matching document contained luteinizing
hormone in both fields 'AbstractText' and 'AbstractTitle'. The average
payload value was
that will work for 3.2.
On Jul 21, 2011, at 4:25 PM, Mark Miller wrote:
Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to
change that to an IndexReader I believe.
- Mark
On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote:
Does this patch require the trunk version? I'm
(field, text));
}
public TermQuery makeTermQuery(String text) {
return new TermQuery(new Term(field, text));
}
}
Peter
On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote:
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote:
On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote
://issues.apache.org/jira/browse/LUCENE-777
Further tests may be needed though.
- Mark
On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote:
Hi Mark,
Here is a unit test using a version of 'SpanWithinQuery' modified for 3.2
('getTerms' removed) . The last test fails (search for 1 and 3
I have browsed many suggestions on how to implement 'search within a
sentence', but all seem to have drawbacks. For example, from
http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072
Steve Rowe writes:
--
One common technique, instead of using a
into sentences and put those in a multi-valued field
and then search that.
On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan peterlkee...@gmail.com
wrote:
I have browsed many suggestions on how to implement 'search within a
sentence', but all seem to have drawbacks. For example, from
http://lucene
running eclipse with -Xmx2G parameter.
This only affects the Eclipse JVM, not the JVM launched by Eclipse to run
your application.
Did you add -Xmx2G to the 'VM arguments' of your Debug or Run configuration?
Peter
On Thu, Oct 21, 2010 at 3:26 PM, Sahin Buyrukbilen
sahin.buyrukbi...@gmail.com
relevant? How formal was that
process?
-Grant
On May 3, 2010, at 11:08 AM, Peter Keegan wrote:
We discovered very soon after going to production that Lucene's scores
were
often 'too precise'. For example, a page of 25 results may have several
different score values, and all within 15
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Peter Keegan [mailto:peterlkee...@gmail.com]
Sent: Thursday, March 11, 2010 9:41 PM
To: java-user@lucene.apache.org
Subject: Re: Combining TopFieldCollector with custom Collector
Yes, but none
Is it possible to issue a single search that combines a TopFieldCollector
(MultiComparatorScoringMaxScoreCollector) with a custom Collector? The
custom Collector just collects the doc IDs into a BitSet (or DocIdSet). The
collect() methods of the various TopFieldCollectors cannot be overridden.
Yes. Could you give me a hint on how to delegate?
On Thu, Mar 11, 2010 at 2:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
Can you make your own collector and then just delegate internally to TFC?
Mike
On Thu, Mar 11, 2010 at 2:30 PM, Peter Keegan peterlkee...@gmail.com
wrote
of Collectors methods that you implement, do your own
stuff (setting the bit) but also then call tfc.XXX (eg tfc.collect).
That should work?
Mike
On Thu, Mar 11, 2010 at 2:57 PM, Peter Keegan peterlkee...@gmail.com
wrote:
Yes. Could you give me a hint on how to delegate?
On Thu, Mar 11, 2010
I want the TFC to do all the cool things it does like custom sorting, saving
the field values, max score, etc. I suppose the custom Collector could
explicitly delegate all TFC's methods, but this doesn't seem right.
Peter
On Thu, Mar 11, 2010 at 3:40 PM, Peter Keegan peterlkee...@gmail.comwrote
, but IW.close does (by default), this means you'll pick up an
extra version whenever a merge is running when you call close.
Mike
On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan peterlkee...@gmail.com
wrote:
I'm pretty sure this output occurred when the version number skipped +1.
The line
(), then close
open the writer, I think (but you better test to be sure!) the next
.getReader().getVersion() should always match.
Mike
On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan peterlkee...@gmail.com
wrote:
Is there a way for the application to wait for the BG commit to finish
before
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's
when the app calls 'getReader' to create external data.
Peter
On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan peterlkee...@gmail.comwrote:
Great, I'll give it a try.
Thanks!
On Fri, Feb 26, 2010 at 3:11 PM, Michael
I've reproduced this and I have a bunch of infoStream log files. Since the
messages have no timestamps, it's hard to tell where the relevant entries
are. What should I be looking for?
Peter
On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan peterlkee...@gmail.comwrote:
I'm pretty sure
you got a reader
with the wrong (unexplained extra +1) version? If so, can you post
the infoStream output up to that point?
Mike
On Thu, Feb 25, 2010 at 10:22 AM, Peter Keegan peterlkee...@gmail.com
wrote:
I've reproduced this and I have a bunch of infoStream log files. Since
Patch is in JIRA: LUCENE-2272
On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan peterlkee...@gmail.comwrote:
Yes, I will provide a patch. Our new proxy server has broken my access to
the svn repository, though :-(
On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at
regular intervals:
1. FSDirectory dir = FSDirectory.open(java.io.File);
2. dir.setLockFactory(new SingleInstanceLockFactory());
3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen)
4.
on prepareCommit (or, commit, if you didn't first prepare,
since that will call prepareCommit internally) that this version
should increase.
Is there only 1 thread doing this?
Oh, and, are you passing false for autoCommit?
Mike
On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan peterlkee...@gmail.com
then. The version should only increment on commit.
Can you make it all happen when infoStream is on, and post back?
Mike
On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan peterlkee...@gmail.com
wrote:
Only one writer thread and one writer process.
I'm calling IndexWriter(Directory d
Yes, I will provide a patch. Our new proxy server has broken my access to
the svn repository, though :-(
On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote:
That sounds reasonable. Patch?
On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote:
The 'explain' method
The 'explain' method in PayloadNearSpanScorer assumes the
AveragePayloadFunction was used. I don't see an easy way to override this
because 'payloadsSeen' and 'payloadScore' are private/protected. It seems
like the 'PayloadFunction' interface should have an 'explain' method that
the Scorer could
Same experience here as Tom. Disk I/O becomes bottleneck with large indexes
(or multiple shards per server) with less memory. Frequent updates to
indexes can make the I/O bottleneck worse.
Peter
On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West tburtonw...@gmail.comwrote:
Hi Chris,
In our
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
get only a subset of the expected results, even accounting for deletes. The
index has 10 segments. In IndexSearcher-searchWithFilter, it looks like
is...
Can you boil it down to a smallish test case?
Mike
On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan peterlkee...@gmail.com
wrote:
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
wraps a simple BitSet. When doing a 'MatchAllDocs' query with this
filter, I
get
:
Peter, which filter do you use, do you respect the IndexReaders
maxDoc() and the docBase?
simon
On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan peterlkee...@gmail.com
wrote:
I think the Filter's docIdSetIterator is using the top level reader for
each
segment, because the cardinality
,
Peter
On Tue, Nov 17, 2009 at 5:49 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan peterlkee...@gmail.com
wrote:
Can you remap your external data to be per segment?
That would provide the tightest integration but would require a major
when the
custom scorer is created? No need to access the map for every doc this way.
Peter
On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.comwrote:
The external data is just an array of fixed-length records, one for each
Lucene document. Indexes are updated at regular intervals
17, 2009 at 11:51 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.com
wrote:
The external data is just an array of fixed-length records, one for each
Lucene document. Indexes are updated at regular intervals in one jvm
I have a custom query object whose scorer uses the 'AllTermDocs' to get all
non-deleted documents. AllTermDocs returns the docId relative to the
segment, but I need the absolute (index-wide) docId to access external data.
What's the best way to get the unique, non-deleted docId?
Thanks,
Peter
I forgot to mention that this is with V2.9.1
On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan peterlkee...@gmail.comwrote:
I have a custom query object whose scorer uses the 'AllTermDocs' to get all
non-deleted documents. AllTermDocs returns the docId relative to the
segment, but I need
The same thing is occurring in my custom sort comparator. The ScoreDocs
passed to the 'compare' method have docIds that seem to be relative to the
segment. Is there any way to translate these into index-wide docIds?
Peter
On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan peterlkee...@gmail.comwrote
the maxDoc. Then, in your search, you can lookup
the SegmentReader you're working on to get the docBase?
Mike
On Mon, Nov 16, 2009 at 2:50 PM, Peter Keegan peterlkee...@gmail.com
wrote:
The same thing is occurring in my custom sort comparator. The ScoreDocs
passed to the 'compare' method have
I know this has been asked before, but I couldn't find the thread.
The jar file produced from a build of 2.9.0 is 'lucene-core-2.9.jar'. For
2.9.1, it is 'lucene-core-2.9.1-dev.jar'. When does the '-dev' get removed?
Peter
-Dversion=2.9.1
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Peter Keegan [mailto:peterlkee...@gmail.com]
Sent: Tuesday, November 10, 2009 12:38 AM
To: java-user
Subject: building lucene-core
formula is always in flux - we likely hard coded the
change in 2.9.0 when releasing - we likely won't again in the future.
Some discussion about it came up recently on the list.
--
- Mark
http://www.lucidimagination.com
Peter Keegan wrote:
OK. I just downloaded the 2.9.0 sources from
source, it doesn't mean you will create something
identical to the official jars that were released.
--
- Mark
http://www.lucidimagination.com
Peter Keegan wrote:
The -dev version is confusing when it's the target of a build from an
official release.
A build with patches from an official
:
Hmm... for step 4 you should have gotten true back from isCurrent.
You're sure there were no intervening calls to IndexWriter.commit?
Are you using Lucene 2.9? If not, you have to make sure autoCommit
is false when opening the IndexWriter.
Mike
On Fri, Nov 6, 2009 at 2:46 PM, Peter Keegan
Are you using Lucene 2.9?
Yes
Peter
On Sun, Nov 8, 2009 at 6:23 PM, Peter Keegan peterlkee...@gmail.com wrote:
Here is some stand-alone code that reproduces the problem. There are 2
classes. jvm1 creates the index, jvm2 reads the index. The system console
input is used to synchronize the 4
? It will
produce an enormous amount of output, but if you can excise the few
lines around when that warning comes out post back that'd be great.
Mike
On Wed, Oct 28, 2009 at 12:23 PM, Peter Keegan peterlkee...@gmail.com
wrote:
Just to be safe, I ran with the official jar file from one of the mirrors
Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with
optimization in just under 30 min.
I used setRAMBufferSizeMB=1.9G
Peter
On Thu, Oct 29, 2009 at 3:46 PM, Peter Keegan peterlkee...@gmail.comwrote:
A handful of the source documents did contain the U+ character
it starts to page and the performance gets hit.
I'd love to see what kind of benefit you see going from around a gig to
just under 2.
Peter Keegan wrote:
Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with
optimization in just under 30 min.
I used setRAMBufferSizeMB=1.9G
:49 PM, Mark Miller markrmil...@gmail.com wrote:
Thanks a lot Peter! Really appreciate it.
Peter Keegan wrote:
Mark,
With 1.9G, I had to increase the JVM heap significantly (to 8G) to avoid
paging and GC hits. Here is a table comparing indexing times, optimizing
times and peak memory
My last post got truncated - probably exceeded max msg size. Let me know if
you want to see more of the IndexWriter log.
Peter
yet, thanks.
Mike
On Wed, Oct 28, 2009 at 10:21 AM, Peter Keegan peterlkee...@gmail.com
wrote:
Yes, I used JDK 1.6.0_16 when running CheckIndex and it reported the same
problems when run multiple times.
Also, what does Lucene version 2.9 exported - 2009-10-27 15:31:52 mean
.
Peter
On Wed, Oct 28, 2009 at 11:29 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan peterlkee...@gmail.com
wrote:
The only change I made to the source code was the patch for
PayloadNearQuery
(LUCENE-1986).
That patch certainly
_0.prx
IFD [Indexer]: delete _0.fdt
Peter
On Mon, Oct 26, 2009 at 3:59 PM, Peter Keegan peterlkee...@gmail.comwrote:
On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com
wrote:
On Mon
CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file segments_5
Peter
On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.comwrote:
After rebuilding the corrupted indexes, the low disk space exception is now
occurring as expected. Sorry
Clarification: this CheckIndex is on the index from which the merge/optimize
failed.
Peter
On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan peterlkee...@gmail.comwrote:
Running CheckIndex after the IOException did produce an error in a term
frequency:
Opening index @ D:\mnsavs\lresumes3
stayed at _03
Thanks.
Mike
On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.com
wrote:
After rebuilding the corrupted indexes, the low disk space exception is
now
occurring as expected. Sorry for the distraction.
fyi, here are the details:
java.io.IOException
: done
IW 0 [Indexer]: at close: _7:C1077025-_0
I see no errors.
Peter
On Tue, Oct 27, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.comwrote:
On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless
luc...@mikemccandless.com wrote:
OK that exception looks more reasonable, for a disk full
:
This is odd -- is it reproducible?
Can you narrow it down to a small set of docs that when indexed
produce a corrupted index?
If you attempt to optimize the index, does it fail?
Mike
On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan peterlkee...@gmail.com
wrote:
It seems the index is corrupted
) detected
WARNING: would write new segments file, and 663862 documents would be lost,
if -fix were specified
Do the unit tests create multi-segment indexes?
Peter
On Tue, Oct 27, 2009 at 3:08 PM, Peter Keegan peterlkee...@gmail.comwrote:
It's reproducible with a large no. of docs (1 million
)
at
org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3695)
I guess this is just the nature of a low disk space condition on Windows. I
expected to see a 'no space left on device' IO exception.
Peter
On Sun, Oct 25, 2009 at 8:54 PM, Peter Keegan peterlkee...@gmail.comwrote
On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Oct 26, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.com
wrote:
Even running in console mode, the exception is difficult to interpret.
Here's an exception that I think occurred during an add
On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com
wrote:
On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Mon, Oct 26, 2009 at 10:44 AM, Peter
include one
traceback into Lucene's optimized method, and then another (under
caused by) showing the exception from the BG merge thread.
Did you see any BG thread exceptions on wherever your System.err is
directed to?
Mike
On Sat, Oct 24, 2009 at 5:21 PM, Peter Keegan peterlkee...@gmail.com
, Peter Keegan peterlkee...@gmail.com
wrote:
Did you get any traceback printed at all?
no, only what I reported.
Did you see any BG thread exceptions on wherever your System.err is
directed to?
The jvm was running as a windows service, so output to System.err may
have
gone to the bit
I'm sometimes seeing the following exception from an operation that does a
merge and optimize:
java.io.IOException: background merge hit exception: _0:C1082866 _1:C79
into _2 [optimize] [mergeDocStores]
I'm pretty sure that it's caused by a temporary low disk space condition,
but I'd like to be
btw, this is with Lucene 2.9
On Sat, Oct 24, 2009 at 5:20 PM, Peter Keegan peterlkee...@gmail.comwrote:
I'm sometimes seeing the following exception from an operation that does a
merge and optimize:
java.io.IOException: background merge hit exception: _0:C1082866 _1:C79
into _2 [optimize
15, 2009, at 1:28 PM, Peter Keegan wrote:
The query is:
+payloadNear([spanNear([contents:insurance, contents:agent], 1,
false),
spanNear([contents:winston, contents:salem], 1, false)], 10, false)
It's using the default payload function scorer (average value)
It doesn't happen on all
I can reproduce this with a unit test - will post to JIRA shortly.
Peter
On Fri, Oct 16, 2009 at 8:06 AM, Peter Keegan peterlkee...@gmail.comwrote:
next() is called in PayloadNearQuery-setFreqCurrentDoc:
super.setFreqCurrentDoc();
But, I think it should be called before 'getPayloads
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered:
java.lang.NullPointerExceptionjava.lang.NullPointerException
at
org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219)
at
this happened on) would be greatly appreciated.
-Yonik
http://www.lucidimagination.com
On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan peterlkee...@gmail.com
wrote:
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered:
java.lang.NullPointerExceptionjava.lang.NullPointerException
I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2.
I'm seeing a huge increase in throughput (2x-10x) on an index that was built
with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses
containing a custom scorer. Using JProfiler, I observe that the
IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods
64% fewer times. I see no 'advance' method in any of the hot spots'. I am
getting the same number of hits from the custom scorer.
Has the BooleanScorer2 logic changed?
Peter
On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley
, but I think now it uses whats best by default? And pairs with
the collector? I didn't follow any of that closely though.
- Mark
Peter Keegan wrote:
IndexSearcher.search is calling my custom scorer's 'next' and 'doc'
methods
64% fewer times. I see no 'advance' method in any of the hot
http://svn.apache.org/viewvc?view=revrevision=630698
This may be it. The scorer is sparse and usually in a conjuction with a
dense scorer.
Does the index format matter? I haven't yet built it with 2.9.
Peter
On Wed, Sep 9, 2009 at 10:17 AM, Yonik Seeley yo...@lucidimagination.comwrote:
On
Or you could try this patch:
*LUCENE-1316https://issues.apache.org/jira/browse/LUCENE-1316
*
Peter*
*
On Thu, Aug 6, 2009 at 8:51 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Opening your IndexReader with readOnly=true should also fix it, I think.
Mike
On Thu, Aug 6, 2009 at
There is a similar discussion on this topic here:
http://www.gossamer-threads.com/lists/lucene/java-user/42824?search_string=Lucene%20search%20performance%3A%20linear%3F;#42824
or: *http://tinyurl.com/lpp3hf*
On Wed, Jun 17, 2009 at 1:18 PM, Teruhiko Kurosaka k...@basistech.comwrote:
Thank
Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the
terms hey look, the quick brown fox jumped very high, but in Doc 1 all the
terms are indexed at the same position. In doc 2, the terms are indexed in
adjacent positions (normal way). For the query the quick brown fox, doc
I suppose SpanTermQuery could override the weight/scorer methods so that
it behaved more like a TermQuery if it was executed directly ... but
that's really not what it's intended for.
This is currently the only way to boost a term via payloads.
BoostingTermQuery extends SpanTermQuery.
if
, Mar 3, 2009 at 2:42 PM, Peter Keegan peterlkee...@gmail.com wrote:
The DefaultSimilarity class defines sloppyFreq as:
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
For a 'SpanNearQuery', this reduces the effect of the term frequency on the
score as the number
The DefaultSimilarity class defines sloppyFreq as:
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
For a 'SpanNearQuery', this reduces the effect of the term frequency on the
score as the number of terms in the span increases. So, for a simple phrase
query (using
On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan peterlkee...@gmail.com
wrote:
As suggested, I added a query-time boost of 0.0f to the 'literals' field
(with index-time boost still there) and I did get the same scores for
both
queries :) (there is a subtlety between index-time and query-time
no affect on the
score, when combined with the above. This seems ok in this example since the
the matching terms had boost = 0.
Thanks Yonik,
Peter
On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley yo...@lucidimagination.comwrote:
On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan peterlkee...@gmail.com
in situations where you deal with simple query types, and matching query
structures, the queryNorm
*can* be used to make scores semi-comparable.
Hmm. My example used matching query structures. The only difference was a
single term in a field with zero weight that didn't exist in the matching
Any comments about this? Is this just the way queryNorm works or is this a
bug?
Thanks,
Peter
On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan peterlkee...@gmail.comwrote:
The explanation of scores from the same document returned from 2 similar
queries differ in an unexpected way. There are 2
Got it. This is another example of why scores can't be compared between
(even similar) queries.
(we don't)
Thanks.
On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley
yo...@lucidimagination.comwrote:
On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan peterlkee...@gmail.com
wrote:
Any comments about
The explanation of scores from the same document returned from 2 similar
queries differ in an unexpected way. There are 2 fields involved, 'contents'
and 'literals'. The 'literals' field has setBoost = 0. As you an see from
the explanations below, the total weight of the matching terms from the
Hi Karl,
I use payloads for weight only, too, with BoostingTermQuery (see:
http://www.nabble.com/BoostingTermQuery-scoring-td20323615.html#a20323615)
A custom tokenizer looks for the reserved character '\b' followed by a 2
byte 'boost' value. It then creates a special Token type for a custom
If you sort first by score, keep in mind that the raw scores are very
precise and you could see many unique values in the result set. The
secondary sort field would only be used to break equal scores. We had to use
a custom comparator to 'smooth out' the scores to allow the second field to
take
performance? (I haven't
tried it yet).
Thanks,
Peter
On Thu, Nov 6, 2008 at 6:56 PM, Steven A Rowe [EMAIL PROTECTED] wrote:
Hi Peter,
On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
I've discovered another flaw in using this technique:
(+contents:petroleum +contents:engineer +contents:refinery
:
Not sure, but it sounds like you are interested in a higher level Query,
kind of like the BooleanQuery, but then part of it sounds like it is per
document, right? Is it that you want to deal with multiple payloads in a
document, or multiple BTQs in a bigger query?
On Nov 4, 2008, at 9:42 AM, Peter
that doc. Yet another
reason to use BoostingTermQuery.
Peter
On Thu, Nov 6, 2008 at 1:08 PM, Peter Keegan [EMAIL PROTECTED] wrote:
Let me give some background on the problem behind my question.
Our index contains many fields (title, body, date, city, etc). Most queries
search all fields
I'm using BoostingTermQuery to boost the score of documents with terms
containing payloads (boost value 1). I'd like to change the scoring
behavior such that if a query contains multiple BoostingTermQuery terms
(either required or optional), documents containing more matching terms with
payloads
at it :)
Peter
On Thu, Jul 10, 2008 at 2:09 PM, Peter Keegan [EMAIL PROTECTED]
wrote:
I may take a crack at this. Any more thoughts you may have on the
implementation are welcome, but I don't want to distract you too much.
Thanks,
Peter
On Thu, Jul 10, 2008 at 1:30 PM, Grant Ingersoll [EMAIL
Ingersoll [EMAIL PROTECTED]
wrote:
I'm not fully following what you want. Can you explain a bit more?
Thanks,
Grant
On Jul 9, 2008, at 2:55 PM, Peter Keegan wrote:
If a SpanQuery is constructed from one or more BoostingTermQuery(s), the
payloads on the terms are never processed
PayloadNearQuery, see http://wiki.apache.org/lucene-java/Payload_Planning
I think it would make sense to develop these and I would be happy to help
shepherd a patch through, but am not in a position to generate said patch at
this moment in time.
On Jul 10, 2008, at 9:59 AM, Peter Keegan wrote
If a SpanQuery is constructed from one or more BoostingTermQuery(s), the
payloads on the terms are never processed by the SpanScorer. It seems to me
that you would want the SpanScorer to score the document both on the spans
distance and the payload score. So, either the SpanScorer would have to
Is it possible to compute a theoretical maximum score for a given query if
constraints are placed on 'tf' and 'lengthNorm'? If so, scores could be
compared to a 'perfect score' (a feature request from our customers)
Here are some related threads on this:
In this thread:
Sridhar,
We have been using approach 2 in our production system with good results. We
have separate processes for indexing and searching. The main issue that came
up was in deleting old indexes (see: *http://tinyurl.com/32q8c4*). Most of
our production problems occur during indexing, and we are
1 - 100 of 181 matches
Mail list logo