Re: sloppyFreq question

2009-03-09 Thread Peter Keegan
, Mar 3, 2009 at 2:42 PM, Peter Keegan peterlkee...@gmail.com wrote: The DefaultSimilarity class defines sloppyFreq as: public float sloppyFreq(int distance) { return 1.0f / (distance + 1); } For a 'SpanNearQuery', this reduces the effect of the term frequency on the score as the number

Re: sloppyFreq question

2009-03-11 Thread Peter Keegan
I suppose SpanTermQuery could override the weight/scorer methods so that it behaved more like a TermQuery if it was executed directly ... but that's really not what it's intended for. This is currently the only way to boost a term via payloads. BoostingTermQuery extends SpanTermQuery. if

Re: sloppyFreq question

2009-03-20 Thread Peter Keegan
Sorry, here's the example I meant to show. Doc 1 and doc 2 both contain the terms hey look, the quick brown fox jumped very high, but in Doc 1 all the terms are indexed at the same position. In doc 2, the terms are indexed in adjacent positions (normal way). For the query the quick brown fox, doc

Re: Lucene performance: is search time linear to the index size?

2009-06-17 Thread Peter Keegan
There is a similar discussion on this topic here: http://www.gossamer-threads.com/lists/lucene/java-user/42824?search_string=Lucene%20search%20performance%3A%20linear%3F;#42824 or: *http://tinyurl.com/lpp3hf* On Wed, Jun 17, 2009 at 1:18 PM, Teruhiko Kurosaka k...@basistech.comwrote: Thank

Re: MatchAllDocsQuery concurrency issue

2009-08-06 Thread Peter Keegan
Or you could try this patch: *LUCENE-1316https://issues.apache.org/jira/browse/LUCENE-1316 * Peter* * On Thu, Aug 6, 2009 at 8:51 AM, Michael McCandless luc...@mikemccandless.com wrote: Opening your IndexReader with readOnly=true should also fix it, I think. Mike On Thu, Aug 6, 2009 at

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2. I'm seeing a huge increase in throughput (2x-10x) on an index that was built with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses containing a custom scorer. Using JProfiler, I observe that the

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot spots'. I am getting the same number of hits from the custom scorer. Has the BooleanScorer2 logic changed? Peter On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
, but I think now it uses whats best by default? And pairs with the collector? I didn't follow any of that closely though. - Mark Peter Keegan wrote: IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
http://svn.apache.org/viewvc?view=revrevision=630698 This may be it. The scorer is sparse and usually in a conjuction with a dense scorer. Does the index format matter? I haven't yet built it with 2.9. Peter On Wed, Sep 9, 2009 at 10:17 AM, Yonik Seeley yo...@lucidimagination.comwrote: On

NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219) at

Re: NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
this happened on) would be greatly appreciated. -Yonik http://www.lucidimagination.com On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerExceptionjava.lang.NullPointerException

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan
15, 2009, at 1:28 PM, Peter Keegan wrote: The query is: +payloadNear([spanNear([contents:insurance, contents:agent], 1, false), spanNear([contents:winston, contents:salem], 1, false)], 10, false) It's using the default payload function scorer (average value) It doesn't happen on all

Re: NPE in NearSpansUnordered

2009-10-16 Thread Peter Keegan
I can reproduce this with a unit test - will post to JIRA shortly. Peter On Fri, Oct 16, 2009 at 8:06 AM, Peter Keegan peterlkee...@gmail.comwrote: next() is called in PayloadNearQuery-setFreqCurrentDoc: super.setFreqCurrentDoc(); But, I think it should be called before 'getPayloads

IO exception during merge/optimize

2009-10-24 Thread Peter Keegan
I'm sometimes seeing the following exception from an operation that does a merge and optimize: java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 into _2 [optimize] [mergeDocStores] I'm pretty sure that it's caused by a temporary low disk space condition, but I'd like to be

Re: IO exception during merge/optimize

2009-10-24 Thread Peter Keegan
btw, this is with Lucene 2.9 On Sat, Oct 24, 2009 at 5:20 PM, Peter Keegan peterlkee...@gmail.comwrote: I'm sometimes seeing the following exception from an operation that does a merge and optimize: java.io.IOException: background merge hit exception: _0:C1082866 _1:C79 into _2 [optimize

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan
include one traceback into Lucene's optimized method, and then another (under caused by) showing the exception from the BG merge thread. Did you see any BG thread exceptions on wherever your System.err is directed to? Mike On Sat, Oct 24, 2009 at 5:21 PM, Peter Keegan peterlkee...@gmail.com

Re: IO exception during merge/optimize

2009-10-25 Thread Peter Keegan
, Peter Keegan peterlkee...@gmail.com wrote: Did you get any traceback printed at all? no, only what I reported. Did you see any BG thread exceptions on wherever your System.err is directed to? The jvm was running as a windows service, so output to System.err may have gone to the bit

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
) at org.apache.lucene.index.IndexWriter.addIndexesNoOptimize(IndexWriter.java:3695) I guess this is just the nature of a low disk space condition on Windows. I expected to see a 'no space left on device' IO exception. Peter On Sun, Oct 25, 2009 at 8:54 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.com wrote: Even running in console mode, the exception is difficult to interpret. Here's an exception that I think occurred during an add

Re: IO exception during merge/optimize

2009-10-26 Thread Peter Keegan
On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com wrote: On Mon, Oct 26, 2009 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 10:44 AM, Peter

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
_0.prx IFD [Indexer]: delete _0.fdt Peter On Mon, Oct 26, 2009 at 3:59 PM, Peter Keegan peterlkee...@gmail.comwrote: On Mon, Oct 26, 2009 at 3:00 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 26, 2009 at 2:55 PM, Peter Keegan peterlkee...@gmail.com wrote: On Mon

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file segments_5 Peter On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.comwrote: After rebuilding the corrupted indexes, the low disk space exception is now occurring as expected. Sorry

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
Clarification: this CheckIndex is on the index from which the merge/optimize failed. Peter On Tue, Oct 27, 2009 at 10:07 AM, Peter Keegan peterlkee...@gmail.comwrote: Running CheckIndex after the IOException did produce an error in a term frequency: Opening index @ D:\mnsavs\lresumes3

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
stayed at _03 Thanks. Mike On Tue, Oct 27, 2009 at 10:00 AM, Peter Keegan peterlkee...@gmail.com wrote: After rebuilding the corrupted indexes, the low disk space exception is now occurring as expected. Sorry for the distraction. fyi, here are the details: java.io.IOException

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
: done IW 0 [Indexer]: at close: _7:C1077025-_0 I see no errors. Peter On Tue, Oct 27, 2009 at 10:44 AM, Peter Keegan peterlkee...@gmail.comwrote: On Tue, Oct 27, 2009 at 10:37 AM, Michael McCandless luc...@mikemccandless.com wrote: OK that exception looks more reasonable, for a disk full

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
: This is odd -- is it reproducible? Can you narrow it down to a small set of docs that when indexed produce a corrupted index? If you attempt to optimize the index, does it fail? Mike On Tue, Oct 27, 2009 at 1:40 PM, Peter Keegan peterlkee...@gmail.com wrote: It seems the index is corrupted

Re: IO exception during merge/optimize

2009-10-27 Thread Peter Keegan
) detected WARNING: would write new segments file, and 663862 documents would be lost, if -fix were specified Do the unit tests create multi-segment indexes? Peter On Tue, Oct 27, 2009 at 3:08 PM, Peter Keegan peterlkee...@gmail.comwrote: It's reproducible with a large no. of docs (1 million

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
My last post got truncated - probably exceeded max msg size. Let me know if you want to see more of the IndexWriter log. Peter

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
yet, thanks. Mike On Wed, Oct 28, 2009 at 10:21 AM, Peter Keegan peterlkee...@gmail.com wrote: Yes, I used JDK 1.6.0_16 when running CheckIndex and it reported the same problems when run multiple times. Also, what does Lucene version 2.9 exported - 2009-10-27 15:31:52 mean

Re: IO exception during merge/optimize

2009-10-28 Thread Peter Keegan
. Peter On Wed, Oct 28, 2009 at 11:29 AM, Michael McCandless luc...@mikemccandless.com wrote: On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan peterlkee...@gmail.com wrote: The only change I made to the source code was the patch for PayloadNearQuery (LUCENE-1986). That patch certainly

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
? It will produce an enormous amount of output, but if you can excise the few lines around when that warning comes out post back that'd be great. Mike On Wed, Oct 28, 2009 at 12:23 PM, Peter Keegan peterlkee...@gmail.com wrote: Just to be safe, I ran with the official jar file from one of the mirrors

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with optimization in just under 30 min. I used setRAMBufferSizeMB=1.9G Peter On Thu, Oct 29, 2009 at 3:46 PM, Peter Keegan peterlkee...@gmail.comwrote: A handful of the source documents did contain the U+ character

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
it starts to page and the performance gets hit. I'd love to see what kind of benefit you see going from around a gig to just under 2. Peter Keegan wrote: Btw, this 2.9 indexer is fast! I indexed 4Gb (1.07 million docs) with optimization in just under 30 min. I used setRAMBufferSizeMB=1.9G

Re: IO exception during merge/optimize

2009-10-29 Thread Peter Keegan
:49 PM, Mark Miller markrmil...@gmail.com wrote: Thanks a lot Peter! Really appreciate it. Peter Keegan wrote: Mark, With 1.9G, I had to increase the JVM heap significantly (to 8G) to avoid paging and GC hits. Here is a table comparing indexing times, optimizing times and peak memory

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan
: Hmm... for step 4 you should have gotten true back from isCurrent. You're sure there were no intervening calls to IndexWriter.commit? Are you using Lucene 2.9? If not, you have to make sure autoCommit is false when opening the IndexWriter. Mike On Fri, Nov 6, 2009 at 2:46 PM, Peter Keegan

Re: 2 phase commit with external data

2009-11-08 Thread Peter Keegan
Are you using Lucene 2.9? Yes Peter On Sun, Nov 8, 2009 at 6:23 PM, Peter Keegan peterlkee...@gmail.com wrote: Here is some stand-alone code that reproduces the problem. There are 2 classes. jvm1 creates the index, jvm2 reads the index. The system console input is used to synchronize the 4

building lucene-core from source

2009-11-09 Thread Peter Keegan
I know this has been asked before, but I couldn't find the thread. The jar file produced from a build of 2.9.0 is 'lucene-core-2.9.jar'. For 2.9.1, it is 'lucene-core-2.9.1-dev.jar'. When does the '-dev' get removed? Peter

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
-Dversion=2.9.1 Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Peter Keegan [mailto:peterlkee...@gmail.com] Sent: Tuesday, November 10, 2009 12:38 AM To: java-user Subject: building lucene-core

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again in the future. Some discussion about it came up recently on the list. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: OK. I just downloaded the 2.9.0 sources from

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
source, it doesn't mean you will create something identical to the official jars that were released. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: The -dev version is confusing when it's the target of a build from an official release. A build with patches from an official

Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need the absolute (index-wide) docId to access external data. What's the best way to get the unique, non-deleted docId? Thanks, Peter

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
I forgot to mention that this is with V2.9.1 On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan peterlkee...@gmail.comwrote: I have a custom query object whose scorer uses the 'AllTermDocs' to get all non-deleted documents. AllTermDocs returns the docId relative to the segment, but I need

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have docIds that seem to be relative to the segment. Is there any way to translate these into index-wide docIds? Peter On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: Use of AllTermDocs with custom scorer

2009-11-16 Thread Peter Keegan
the maxDoc. Then, in your search, you can lookup the SegmentReader you're working on to get the docBase? Mike On Mon, Nov 16, 2009 at 2:50 PM, Peter Keegan peterlkee...@gmail.com wrote: The same thing is occurring in my custom sort comparator. The ScoreDocs passed to the 'compare' method have

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
, Peter On Tue, Nov 17, 2009 at 5:49 AM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan peterlkee...@gmail.com wrote: Can you remap your external data to be per segment? That would provide the tightest integration but would require a major

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
when the custom scorer is created? No need to access the map for every doc this way. Peter On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.comwrote: The external data is just an array of fixed-length records, one for each Lucene document. Indexes are updated at regular intervals

Re: Use of AllTermDocs with custom scorer

2009-11-17 Thread Peter Keegan
17, 2009 at 11:51 AM, Michael McCandless luc...@mikemccandless.com wrote: On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan peterlkee...@gmail.com wrote: The external data is just an array of fixed-length records, one for each Lucene document. Indexes are updated at regular intervals in one jvm

searchWithFilter bug?

2009-12-04 Thread Peter Keegan
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I get only a subset of the expected results, even accounting for deletes. The index has 10 segments. In IndexSearcher-searchWithFilter, it looks like

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan
is... Can you boil it down to a smallish test case? Mike On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan peterlkee...@gmail.com wrote: I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I get

Re: searchWithFilter bug?

2009-12-04 Thread Peter Keegan
: Peter, which filter do you use, do you respect the IndexReaders maxDoc() and the docBase? simon On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan peterlkee...@gmail.com wrote: I think the Filter's docIdSetIterator is using the top level reader for each segment, because the cardinality

PayloadNearSpanScorer explain method

2010-02-15 Thread Peter Keegan
The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. I don't see an easy way to override this because 'payloadsSeen' and 'payloadScore' are private/protected. It seems like the 'PayloadFunction' interface should have an 'explain' method that the Scorer could

Re: Can you use reduced sized test indexes to predict performance gains for a larger index?

2010-02-15 Thread Peter Keegan
Same experience here as Tom. Disk I/O becomes bottleneck with large indexes (or multiple shards per server) with less memory. Frequent updates to indexes can make the I/O bottleneck worse. Peter On Mon, Feb 15, 2010 at 2:17 PM, Tom Burton-West tburtonw...@gmail.comwrote: Hi Chris, In our

Re: PayloadNearSpanScorer explain method

2010-02-17 Thread Peter Keegan
Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote: That sounds reasonable. Patch? On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote: The 'explain' method

Re: PayloadNearSpanScorer explain method

2010-02-22 Thread Peter Keegan
Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan peterlkee...@gmail.comwrote: Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote

IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4.

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
on prepareCommit (or, commit, if you didn't first prepare, since that will call prepareCommit internally) that this version should increase. Is there only 1 thread doing this? Oh, and, are you passing false for autoCommit? Mike On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan peterlkee...@gmail.com

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
then. The version should only increment on commit. Can you make it all happen when infoStream is on, and post back? Mike On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan peterlkee...@gmail.com wrote: Only one writer thread and one writer process. I'm calling IndexWriter(Directory d

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
I've reproduced this and I have a bunch of infoStream log files. Since the messages have no timestamps, it's hard to tell where the relevant entries are. What should I be looking for? Peter On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan peterlkee...@gmail.comwrote: I'm pretty sure

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
you got a reader with the wrong (unexplained extra +1) version? If so, can you post the infoStream output up to that point? Mike On Thu, Feb 25, 2010 at 10:22 AM, Peter Keegan peterlkee...@gmail.com wrote: I've reproduced this and I have a bunch of infoStream log files. Since

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
, but IW.close does (by default), this means you'll pick up an extra version whenever a merge is running when you call close. Mike On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm pretty sure this output occurred when the version number skipped +1. The line

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
(), then close open the writer, I think (but you better test to be sure!) the next .getReader().getVersion() should always match. Mike On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan peterlkee...@gmail.com wrote: Is there a way for the application to wait for the BG commit to finish before

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan peterlkee...@gmail.comwrote: Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael

Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Is it possible to issue a single search that combines a TopFieldCollector (MultiComparatorScoringMaxScoreCollector) with a custom Collector? The custom Collector just collects the doc IDs into a BitSet (or DocIdSet). The collect() methods of the various TopFieldCollectors cannot be overridden.

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010 at 2:50 PM, Michael McCandless luc...@mikemccandless.com wrote: Can you make your own collector and then just delegate internally to TFC? Mike On Thu, Mar 11, 2010 at 2:30 PM, Peter Keegan peterlkee...@gmail.com wrote

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
of Collectors methods that you implement, do your own stuff (setting the bit) but also then call tfc.XXX (eg tfc.collect). That should work? Mike On Thu, Mar 11, 2010 at 2:57 PM, Peter Keegan peterlkee...@gmail.com wrote: Yes. Could you give me a hint on how to delegate? On Thu, Mar 11, 2010

Re: Combining TopFieldCollector with custom Collector

2010-03-11 Thread Peter Keegan
I want the TFC to do all the cool things it does like custom sorting, saving the field values, max score, etc. I suppose the custom Collector could explicitly delegate all TFC's methods, but this doesn't seem right. Peter On Thu, Mar 11, 2010 at 3:40 PM, Peter Keegan peterlkee...@gmail.comwrote

Re: Combining TopFieldCollector with custom Collector

2010-03-12 Thread Peter Keegan
http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Peter Keegan [mailto:peterlkee...@gmail.com] Sent: Thursday, March 11, 2010 9:41 PM To: java-user@lucene.apache.org Subject: Re: Combining TopFieldCollector with custom Collector Yes, but none

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan
relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: We discovered very soon after going to production that Lucene's scores were often 'too precise'. For example, a page of 25 results may have several different score values, and all within 15

Re: how to index large number of files?

2010-10-22 Thread Peter Keegan
running eclipse with -Xmx2G parameter. This only affects the Eclipse JVM, not the JVM launched by Eclipse to run your application. Did you add -Xmx2G to the 'VM arguments' of your Debug or Run configuration? Peter On Thu, Oct 21, 2010 at 3:26 PM, Sahin Buyrukbilen sahin.buyrukbi...@gmail.com

Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072 Steve Rowe writes: -- One common technique, instead of using a

Re: Search within a sentence (revisited)

2011-07-20 Thread Peter Keegan
into sentences and put those in a multi-valued field and then search that. On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan peterlkee...@gmail.com wrote: I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan
(field, text)); } public TermQuery makeTermQuery(String text) { return new TermQuery(new Term(field, text)); } } Peter On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote

Re: Search within a sentence (revisited)

2011-07-21 Thread Peter Keegan
://issues.apache.org/jira/browse/LUCENE-777 Further tests may be needed though. - Mark On Jul 21, 2011, at 9:28 AM, Peter Keegan wrote: Hi Mark, Here is a unit test using a version of 'SpanWithinQuery' modified for 3.2 ('getTerms' removed) . The last test fails (search for 1 and 3

Re: Search within a sentence (revisited)

2011-07-25 Thread Peter Keegan
that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change that to an IndexReader I believe. - Mark On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: Does this patch require the trunk version? I'm

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-02 Thread Peter Keegan
I don't quite follow what you're doing, but is it possible that your payloads are not on the desired terms when you indexed them? The first explanation shows that the matching document contained luteinizing hormone in both fields 'AbstractText' and 'AbstractTitle'. The average payload value was

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan
AveragPayloadFunction is just what it sounds like: return numPayloadsSeen 0 ? (payloadScore / numPayloadsSeen) : 1; What values are you seeing returned from PayloadHelper.decodeFloat ? Peter On Fri, Feb 3, 2012 at 4:13 AM, shyama shyamasree_s...@yahoo.com wrote: Hi Peter I have checked

Re: PayloadNearQuery and AveragePayloadFunction

2012-02-03 Thread Peter Keegan
All term queries, including payload queries, deal only with words from the query that exist in a document. They don't know what other terms are in a matching document, due to the inverted nature of the index. Peter On Fri, Feb 3, 2012 at 11:50 AM, shyama shyamasree_s...@yahoo.com wrote: Hi

DocValues memory usage

2013-03-26 Thread Peter Keegan
Inspired by this presentation of DocValues: http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene I decided to try them out in 4.2. I created a 1M document index with one DocValues field: BinaryDocValuesField conceptsDV = new

Re: DocValues memory usage

2013-03-28 Thread Peter Keegan
. Then I used per-field codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but I'm not feeling confident with this usage. Anyone can say more about removing DirectSource API? On 2013-3-26, at 22:59, Peter Keegan peterlkee...@gmail.com wrote: Inspired by this presentation

Exceptions during batch indexing

2014-11-06 Thread Peter Keegan
How are folks handling Solr exceptions that occur during batch indexing? Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc with a missing mandatory field), and stops indexing. The bad document is not identified, so it would be hard for the client to recover by skipping over

Re: Exceptions during batch indexing

2014-11-10 Thread Peter Keegan
-Original Message- From: Peter Keegan Sent: Thursday, November 6, 2014 3:21 PM To: java-user Subject: Exceptions during batch indexing How are folks handling Solr exceptions that occur during batch indexing? Solr (4.6) stops parsing the docs stream when an error occurs (e.g. a doc

<    1   2