from:"Peter Keegan"

Re: Exceptions during batch indexing

2014-11-10 Thread Peter Keegan

-Original Message- From: Peter Keegan Sent: Thursday, November 6, 2014 3:21 PM To: java-user Subject: Exceptions during batch indexing How are folks handling Solr exceptions that occur during batch indexing? Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc

Exceptions during batch indexing

2014-11-06 Thread Peter Keegan

How are folks handling Solr exceptions that occur during batch indexing? Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing. The bad document is not identified, so it would be hard for the client to recover by skipping over

Re: DocValues memory usage

2013-03-28 Thread Peter Keegan

. Then I used per-field codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but I'm not feeling confident with this usage. Anyone can say more about removing DirectSource API? On 2013-3-26, at 22:59, Peter Keegan peterlkee...@gmail.com wrote: Inspired by this presentation

DocValues memory usage

2013-03-26 Thread Peter Keegan

Inspired by this presentation of DocValues: http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene I decided to try them out in 4.2. I created a 1M document index with one DocValues field: BinaryDocValuesField conceptsDV = new

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan

AveragPayloadFunction is just what it sounds like: return numPayloadsSeen 0 ? (payloadScore / numPayloadsSeen) : 1; What values are you seeing returned from PayloadHelper.decodeFloat ? Peter On Fri, Feb 3, 2012 at 4:13 AM, shyama shyamasree_s...@yahoo.com wrote: Hi Peter I have checked

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan

All term queries, including payload queries, deal only with words from the query that exist in a document. They don't know what other terms are in a matching document, due to the inverted nature of the index. Peter On Fri, Feb 3, 2012 at 11:50 AM, shyama shyamasree_s...@yahoo.com wrote: Hi

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-02 Thread Peter Keegan

I don't quite follow what you're doing, but is it possible that your payloads are not on the desired terms when you indexed them? The first explanation shows that the matching document contained luteinizing hormone in both fields 'AbstractText' and 'AbstractTitle'. The average payload value was

Re: Search within a sentence (revisited)

2011-07-25 Thread Peter Keegan

that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change that to an IndexReader I believe. - Mark On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: Does this patch require the trunk version? I'm

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan

(field, text)); } public TermQuery makeTermQuery(String text) { return new TermQuery(new Term(field, text)); } } Peter On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan

://issues.apache.org/jira/browse/LUCENE-777 Further tests may be needed though. - Mark On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote: Hi Mark, Here is a unit test using a version of 'SpanWithinQuery' modified for 3.2 ('getTerms' removed) . The last test fails (search for 1 and 3

Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan

I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072 Steve Rowe writes: -- One common technique, instead of using a

Re: Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan

into sentences and put those in a multi-valued field and then search that. On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan peterlkee...@gmail.com wrote: I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene

Re: how to index large number of files?

2010-10-22 Thread Peter Keegan

running eclipse with -Xmx2G parameter. This only affects the Eclipse JVM, not the JVM launched by Eclipse to run your application. Did you add -Xmx2G to the 'VM arguments' of your Debug or Run configuration? Peter On Thu, Oct 21, 2010 at 3:26 PM, Sahin Buyrukbilen sahin.buyrukbi...@gmail.com

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan

relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: We discovered very soon after going to production that Lucene's scores were often 'too precise'. For example, a page of 25 results may have several different score values, and all within 15

Re: Combining TopFieldCollector with custom Collector

2010-03-12 Thread Peter Keegan

http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Peter Keegan [mailto:peterlkee...@gmail.com] Sent: Thursday, March 11, 2010 9:41 PM To: java-user@lucene.apache.org Subject: Re: Combining TopFieldCollector with custom Collector Yes, but none

Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan

Is it possible to issue a single search that combines a TopFieldCollector (MultiComparatorScoringMaxScoreCollector) with a custom Collector? The custom Collector just collects the doc IDs into a BitSet (or DocIdSet). The collect() methods of the various TopFieldCollectors cannot be overridden.

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan

Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: Can you make your own collector and then just delegate internally to TFC? Mike On Thu, Mar 11, 2010 at 2:30 PM, Peter Keegan peterlkee...@gmail.com wrote

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan

of Collectors methods that you implement, do your own stuff (setting the bit) but also then call tfc.XXX (eg tfc.collect). That should work? Mike On Thu, Mar 11, 2010 at 2:57 PM, Peter Keegan peterlkee...@gmail.com wrote: Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan

I want the TFC to do all the cool things it does like custom sorting, saving the field values, max score, etc. I suppose the custom Collector could explicitly delegate all TFC's methods, but this doesn't seem right. Peter On Thu, Mar 11, 2010 at 3:40 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan

, but IW.close does (by default), this means you'll pick up an extra version whenever a merge is running when you call close. Mike On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm pretty sure this output occurred when the version number skipped +1. The line

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan

(), then close open the writer, I think (but you better test to be sure!) the next .getReader().getVersion() should always match. Mike On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan peterlkee...@gmail.com wrote: Is there a way for the application to wait for the BG commit to finish before

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan

Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan peterlkee...@gmail.comwrote: Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan

I've reproduced this and I have a bunch of infoStream log files. Since the messages have no timestamps, it's hard to tell where the relevant entries are. What should I be looking for? Peter On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan peterlkee...@gmail.comwrote: I'm pretty sure

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan

you got a reader with the wrong (unexplained extra +1) version? If so, can you post the infoStream output up to that point? Mike On Thu, Feb 25, 2010 at 10:22 AM, Peter Keegan peterlkee...@gmail.com wrote: I've reproduced this and I have a bunch of infoStream log files. Since

Re: PayloadNearSpanScorer explain method

2010-02-22 Thread Peter Keegan

Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan peterlkee...@gmail.comwrote: Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote

IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4.

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

on prepareCommit (or, commit, if you didn't first prepare, since that will call prepareCommit internally) that this version should increase. Is there only 1 thread doing this? Oh, and, are you passing false for autoCommit? Mike On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan peterlkee...@gmail.com

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

then. The version should only increment on commit. Can you make it all happen when infoStream is on, and post back? Mike On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan peterlkee...@gmail.com wrote: Only one writer thread and one writer process. I'm calling IndexWriter(Directory d

Re: PayloadNearSpanScorer explain method

2010-02-17 Thread Peter Keegan

Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote: That sounds reasonable. Patch? On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote: The 'explain' method

PayloadNearSpanScorer explain method

2010-02-15 Thread Peter Keegan

The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. I don't see an easy way to override this because 'payloadsSeen' and 'payloadScore' are private/protected. It seems like the 'PayloadFunction' interface should have an 'explain' method that the Scorer could

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Peter Keegan

Same experience here as Tom. Disk I/O becomes bottleneck with large indexes (or multiple shards per server) with less memory. Frequent updates to indexes can make the I/O bottleneck worse. Peter On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West tburtonw...@gmail.comwrote: Hi Chris, In our

searchWithFilter bug?

2009-12-04 Thread Peter Keegan

I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I get only a subset of the expected results, even accounting for deletes. The index has 10 segments. In IndexSearcher-searchWithFilter, it looks like

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan

is... Can you boil it down to a smallish test case? Mike On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I get

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan

: Peter, which filter do you use, do you respect the IndexReaders maxDoc() and the docBase? simon On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan peterlkee...@gmail.com wrote: I think the Filter's docIdSetIterator is using the top level reader for each segment, because the cardinality

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan

, Peter On Tue, Nov 17, 2009 at 5:49 AM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan peterlkee...@gmail.com wrote: Can you remap your external data to be per segment? That would provide the tightest integration but would require a major

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan

when the custom scorer is created? No need to access the map for every doc this way. Peter On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.comwrote: The external data is just an array of fixed-length records, one for each Lucene document. Indexes are updated at regular intervals

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan

17, 2009 at 11:51 AM, Michael McCandless luc...@mikemccandless.com wrote: On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.com wrote: The external data is just an array of fixed-length records, one for each Lucene document. Indexes are updated at regular intervals in one jvm

Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan

I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need the absolute (index-wide) docId to access external data. What's the best way to get the unique, non-deleted docId? Thanks, Peter

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan

I forgot to mention that this is with V2.9.1 On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan peterlkee...@gmail.comwrote: I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan

The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have docIds that seem to be relative to the segment. Is there any way to translate these into index-wide docIds? Peter On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan

the maxDoc. Then, in your search, you can lookup the SegmentReader you're working on to get the docBase? Mike On Mon, Nov 16, 2009 at 2:50 PM, Peter Keegan peterlkee...@gmail.com wrote: The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have

building lucene-core from source

2009-11-09 Thread Peter Keegan

I know this has been asked before, but I couldn't find the thread. The jar file produced from a build of 2.9.0 is 'lucene-core-2.9.jar'. For 2.9.1, it is 'lucene-core-2.9.1-dev.jar'. When does the '-dev' get removed? Peter

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan

-Dversion=2.9.1 Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Peter Keegan [mailto:peterlkee...@gmail.com] Sent: Tuesday, November 10, 2009 12:38 AM To: java-user Subject: building lucene-core

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan

formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again in the future. Some discussion about it came up recently on the list. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: OK. I just downloaded the 2.9.0 sources from

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan

source, it doesn't mean you will create something identical to the official jars that were released. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: The -dev version is confusing when it's the target of a build from an official release. A build with patches from an official

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan

: Hmm... for step 4 you should have gotten true back from isCurrent. You're sure there were no intervening calls to IndexWriter.commit? Are you using Lucene 2.9? If not, you have to make sure autoCommit is false when opening the IndexWriter. Mike On Fri, Nov 6, 2009 at 2:46 PM, Peter Keegan

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan

Are you using Lucene 2.9? Yes Peter On Sun, Nov 8, 2009 at 6:23 PM, Peter Keegan peterlkee...@gmail.com wrote: Here is some stand-alone code that reproduces the problem. There are 2 classes. jvm1 creates the index, jvm2 reads the index. The system console input is used to synchronize the 4

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan

? It will produce an enormous amount of output, but if you can excise the few lines around when that warning comes out post back that'd be great. Mike On Wed, Oct 28, 2009 at 12:23 PM, Peter Keegan peterlkee...@gmail.com wrote: Just to be safe, I ran with the official jar file from one of the mirrors

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan

Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with optimization in just under 30 min. I used setRAMBufferSizeMB=1.9G Peter On Thu, Oct 29, 2009 at 3:46 PM, Peter Keegan peterlkee...@gmail.comwrote: A handful of the source documents did contain the U+ character

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan

it starts to page and the performance gets hit. I'd love to see what kind of benefit you see going from around a gig to just under 2. Peter Keegan wrote: Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with optimization in just under 30 min. I used setRAMBufferSizeMB=1.9G

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan

:49 PM, Mark Miller markrmil...@gmail.com wrote: Thanks a lot Peter! Really appreciate it. Peter Keegan wrote: Mark, With 1.9G, I had to increase the JVM heap significantly (to 8G) to avoid paging and GC hits. Here is a table comparing indexing times, optimizing times and peak memory

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan

My last post got truncated - probably exceeded max msg size. Let me know if you want to see more of the IndexWriter log. Peter

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan

yet, thanks. Mike On Wed, Oct 28, 2009 at 10:21 AM, Peter Keegan peterlkee...@gmail.com wrote: Yes, I used JDK 1.6.0_16 when running CheckIndex and it reported the same problems when run multiple times. Also, what does Lucene version 2.9 exported - 2009-10-27 15:31:52 mean

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan

. Peter On Wed, Oct 28, 2009 at 11:29 AM, Michael McCandless luc...@mikemccandless.com wrote: On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan peterlkee...@gmail.com wrote: The only change I made to the source code was the patch for PayloadNearQuery (LUCENE-1986). That patch certainly

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

_0.prx IFD [Indexer]: delete _0.fdt Peter On Mon, Oct 26, 2009 at 3:59 PM, Peter Keegan peterlkee...@gmail.comwrote: On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com wrote: On Mon

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file segments_5 Peter On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.comwrote: After rebuilding the corrupted indexes, the low disk space exception is now occurring as expected. Sorry

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

Clarification: this CheckIndex is on the index from which the merge/optimize failed. Peter On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan peterlkee...@gmail.comwrote: Running CheckIndex after the IOException did produce an error in a term frequency: Opening index @ D:\mnsavs\lresumes3

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

stayed at _03 Thanks. Mike On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.com wrote: After rebuilding the corrupted indexes, the low disk space exception is now occurring as expected. Sorry for the distraction. fyi, here are the details: java.io.IOException

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

: done IW 0 [Indexer]: at close: _7:C1077025-_0 I see no errors. Peter On Tue, Oct 27, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.comwrote: On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless luc...@mikemccandless.com wrote: OK that exception looks more reasonable, for a disk full

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

: This is odd -- is it reproducible? Can you narrow it down to a small set of docs that when indexed produce a corrupted index? If you attempt to optimize the index, does it fail? Mike On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan peterlkee...@gmail.com wrote: It seems the index is corrupted

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan

) detected WARNING: would write new segments file, and 663862 documents would be lost, if -fix were specified Do the unit tests create multi-segment indexes? Peter On Tue, Oct 27, 2009 at 3:08 PM, Peter Keegan peterlkee...@gmail.comwrote: It's reproducible with a large no. of docs (1 million

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan

) at org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3695) I guess this is just the nature of a low disk space condition on Windows. I expected to see a 'no space left on device' IO exception. Peter On Sun, Oct 25, 2009 at 8:54 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan

On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.com wrote: Even running in console mode, the exception is difficult to interpret. Here's an exception that I think occurred during an add

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan

On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com wrote: On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 10:44 AM, Peter

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan

include one traceback into Lucene's optimized method, and then another (under caused by) showing the exception from the BG merge thread. Did you see any BG thread exceptions on wherever your System.err is directed to? Mike On Sat, Oct 24, 2009 at 5:21 PM, Peter Keegan peterlkee...@gmail.com

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan

, Peter Keegan peterlkee...@gmail.com wrote: Did you get any traceback printed at all? no, only what I reported. Did you see any BG thread exceptions on wherever your System.err is directed to? The jvm was running as a windows service, so output to System.err may have gone to the bit

IO exception during merge/optimize

2009-10-24 Thread Peter Keegan

I'm sometimes seeing the following exception from an operation that does a merge and optimize: java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 into _2 [optimize] [mergeDocStores] I'm pretty sure that it's caused by a temporary low disk space condition, but I'd like to be

Re: IO exception during merge/optimize

2009-10-24 Thread Peter Keegan

btw, this is with Lucene 2.9 On Sat, Oct 24, 2009 at 5:20 PM, Peter Keegan peterlkee...@gmail.comwrote: I'm sometimes seeing the following exception from an operation that does a merge and optimize: java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 into _2 [optimize

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan

15, 2009, at 1:28 PM, Peter Keegan wrote: The query is: +payloadNear([spanNear([contents:insurance, contents:agent], 1, false), spanNear([contents:winston, contents:salem], 1, false)], 10, false) It's using the default payload function scorer (average value) It doesn't happen on all

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan

I can reproduce this with a unit test - will post to JIRA shortly. Peter On Fri, Oct 16, 2009 at 8:06 AM, Peter Keegan peterlkee...@gmail.comwrote: next() is called in PayloadNearQuery-setFreqCurrentDoc: super.setFreqCurrentDoc(); But, I think it should be called before 'getPayloads

NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan

I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219) at

Re: NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan

this happened on) would be greatly appreciated. -Yonik http://www.lucidimagination.com On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerExceptionjava.lang.NullPointerException

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan

I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2. I'm seeing a huge increase in throughput (2x-10x) on an index that was built with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses containing a custom scorer. Using JProfiler, I observe that the

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan

IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot spots'. I am getting the same number of hits from the custom scorer. Has the BooleanScorer2 logic changed? Peter On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan

, but I think now it uses whats best by default? And pairs with the collector? I didn't follow any of that closely though. - Mark Peter Keegan wrote: IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan

http://svn.apache.org/viewvc?view=revrevision=630698 This may be it. The scorer is sparse and usually in a conjuction with a dense scorer. Does the index format matter? I haven't yet built it with 2.9. Peter On Wed, Sep 9, 2009 at 10:17 AM, Yonik Seeley yo...@lucidimagination.comwrote: On

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Peter Keegan

Or you could try this patch: *LUCENE-1316https://issues.apache.org/jira/browse/LUCENE-1316 * Peter* * On Thu, Aug 6, 2009 at 8:51 AM, Michael McCandless luc...@mikemccandless.com wrote: Opening your IndexReader with readOnly=true should also fix it, I think. Mike On Thu, Aug 6, 2009 at

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Peter Keegan

There is a similar discussion on this topic here: http://www.gossamer-threads.com/lists/lucene/java-user/42824?search_string=Lucene%20search%20performance%3A%20linear%3F;#42824 or: *http://tinyurl.com/lpp3hf* On Wed, Jun 17, 2009 at 1:18 PM, Teruhiko Kurosaka k...@basistech.comwrote: Thank

Re: sloppyFreq question

2009-03-20 Thread Peter Keegan

Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the terms hey look, the quick brown fox jumped very high, but in Doc 1 all the terms are indexed at the same position. In doc 2, the terms are indexed in adjacent positions (normal way). For the query the quick brown fox, doc

Re: sloppyFreq question

2009-03-11 Thread Peter Keegan

I suppose SpanTermQuery could override the weight/scorer methods so that it behaved more like a TermQuery if it was executed directly ... but that's really not what it's intended for. This is currently the only way to boost a term via payloads. BoostingTermQuery extends SpanTermQuery. if

Re: sloppyFreq question

2009-03-09 Thread Peter Keegan

, Mar 3, 2009 at 2:42 PM, Peter Keegan peterlkee...@gmail.com wrote: The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number

sloppyFreq question

2009-03-03 Thread Peter Keegan

The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number of terms in the span increases. So, for a simple phrase query (using

Re: queryNorm affect on score

2009-03-02 Thread Peter Keegan

On Sun, Mar 1, 2009 at 8:57 PM, Peter Keegan peterlkee...@gmail.com wrote: As suggested, I added a query-time boost of 0.0f to the 'literals' field (with index-time boost still there) and I did get the same scores for both queries :) (there is a subtlety between index-time and query-time

Re: queryNorm affect on score

2009-03-01 Thread Peter Keegan

no affect on the score, when combined with the above. This seems ok in this example since the the matching terms had boost = 0. Thanks Yonik, Peter On Sat, Feb 28, 2009 at 6:02 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Feb 28, 2009 at 3:02 PM, Peter Keegan peterlkee...@gmail.com

Re: queryNorm affect on score

2009-02-28 Thread Peter Keegan

in situations where you deal with simple query types, and matching query structures, the queryNorm *can* be used to make scores semi-comparable. Hmm. My example used matching query structures. The only difference was a single term in a field with zero weight that didn't exist in the matching

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan

Any comments about this? Is this just the way queryNorm works or is this a bug? Thanks, Peter On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan peterlkee...@gmail.comwrote: The explanation of scores from the same document returned from 2 similar queries differ in an unexpected way. There are 2

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan

Got it. This is another example of why scores can't be compared between (even similar) queries. (we don't) Thanks. On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan peterlkee...@gmail.com wrote: Any comments about

queryNorm affect on score

2009-02-20 Thread Peter Keegan

The explanation of scores from the same document returned from 2 similar queries differ in an unexpected way. There are 2 fields involved, 'contents' and 'literals'. The 'literals' field has setBoost = 0. As you an see from the explanations below, the total weight of the matching terms from the

Re: Payloads

2008-12-29 Thread Peter Keegan

Hi Karl, I use payloads for weight only, too, with BoostingTermQuery (see: http://www.nabble.com/BoostingTermQuery-scoring-td20323615.html#a20323615) A custom tokenizer looks for the reserved character '\b' followed by a 2 byte 'boost' value. It then creates a special Token type for a custom

Re: Boosting results

2008-11-07 Thread Peter Keegan

If you sort first by score, keep in mind that the raw scores are very precise and you could see many unique values in the result set. The secondary sort field would only be used to break equal scores. We had to use a custom comparator to 'smooth out' the scores to allow the second field to take

Re: BoostingTermQuery scoring

2008-11-07 Thread Peter Keegan

performance? (I haven't tried it yet). Thanks, Peter On Thu, Nov 6, 2008 at 6:56 PM, Steven A Rowe [EMAIL PROTECTED] wrote: Hi Peter, On 11/06/2008 at 4:25 PM, Peter Keegan wrote: I've discovered another flaw in using this technique: (+contents:petroleum +contents:engineer +contents:refinery

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan

: Not sure, but it sounds like you are interested in a higher level Query, kind of like the BooleanQuery, but then part of it sounds like it is per document, right? Is it that you want to deal with multiple payloads in a document, or multiple BTQs in a bigger query? On Nov 4, 2008, at 9:42 AM, Peter

Re: BoostingTermQuery scoring

2008-11-06 Thread Peter Keegan

that doc. Yet another reason to use BoostingTermQuery. Peter On Thu, Nov 6, 2008 at 1:08 PM, Peter Keegan [EMAIL PROTECTED] wrote: Let me give some background on the problem behind my question. Our index contains many fields (title, body, date, city, etc). Most queries search all fields

BoostingTermQuery scoring

2008-11-04 Thread Peter Keegan

I'm using BoostingTermQuery to boost the score of documents with terms containing payloads (boost value 1). I'd like to change the scoring behavior such that if a query contains multiple BoostingTermQuery terms (either required or optional), documents containing more matching terms with payloads

Re: Payloads and SpanScorer

2008-07-19 Thread Peter Keegan

at it :) Peter On Thu, Jul 10, 2008 at 2:09 PM, Peter Keegan [EMAIL PROTECTED] wrote: I may take a crack at this. Any more thoughts you may have on the implementation are welcome, but I don't want to distract you too much. Thanks, Peter On Thu, Jul 10, 2008 at 1:30 PM, Grant Ingersoll [EMAIL

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan

Ingersoll [EMAIL PROTECTED] wrote: I'm not fully following what you want. Can you explain a bit more? Thanks, Grant On Jul 9, 2008, at 2:55 PM, Peter Keegan wrote: If a SpanQuery is constructed from one or more BoostingTermQuery(s), the payloads on the terms are never processed

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan

PayloadNearQuery, see http://wiki.apache.org/lucene-java/Payload_Planning I think it would make sense to develop these and I would be happy to help shepherd a patch through, but am not in a position to generate said patch at this moment in time. On Jul 10, 2008, at 9:59 AM, Peter Keegan wrote

Payloads and SpanScorer

2008-07-09 Thread Peter Keegan

If a SpanQuery is constructed from one or more BoostingTermQuery(s), the payloads on the terms are never processed by the SpanScorer. It seems to me that you would want the SpanScorer to score the document both on the spans distance and the payload score. So, either the SpanScorer would have to

theoretical maximum score

2008-05-09 Thread Peter Keegan

Is it possible to compute a theoretical maximum score for a given query if constraints are placed on 'tf' and 'lengthNorm'? If so, scores could be compared to a 'perfect score' (a feature request from our customers) Here are some related threads on this: In this thread:

Re: Swapping between indexes

2008-03-06 Thread Peter Keegan

Sridhar, We have been using approach 2 in our production system with good results. We have separate processes for indexing and searching. The main issue that came up was in deleting old indexes (see: *http://tinyurl.com/32q8c4*). Most of our production problems occur during indexing, and we are

1 2 >

1 - 100 of 181 matches

Mail list logo