Re: Lucene in action

2023-06-10 Thread Mark Miller
Nature abhors being anything but an author by name on a second tech book. The ruse is up after one when you have the inputs crystalized and the hourly wage in hand. Hard to find anything but executive producers after that. I’d shoot for a persuasive crowdfunding attempt.

[ANNOUNCE] Apache Lucene 4.10.3 released

2014-12-29 Thread Mark Miller
. If that is the case, please try another mirror. This also goes for Maven access. Happy Holidays, Mark Miller http://www.about.me/markrmiller - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

[ANNOUNCE] Apache Lucene 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Lucene™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.5.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable

[ANNOUNCE] Apache Lucene 4.2.1 released

2013-04-03 Thread Mark Miller
April 2013, Apache Lucene™ 4.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.2.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

Re: Luke?

2013-03-15 Thread Mark Miller
If anyone is able to donate some effort, a nice future scenario could be that Luke comes fully up to date with every Lucene release: https://issues.apache.org/jira/browse/LUCENE-2562 - Mark On Mar 15, 2013, at 5:58 AM, Eric Charles e...@apache.org wrote: For the record, I happily use Luke

Re: read past EOF when merge

2012-11-03 Thread Mark Miller
Can you file a JIRA Markus? This is probably related to the new code that uses Directory for replication. - Mark On Nov 2, 2012, at 6:53 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, For what it's worth, we have seen similar issues with Lucene/Solr from this week's trunk. The

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-08 Thread Mark Miller
While we are in constant sync due to the merge, lucene would still be updated multiple times before a solr 4 release, and it would be subject to happen at any time - so its really not any different. On Wednesday, December 7, 2011, Jamie Johnson jej2...@gmail.com wrote: Yeah, biggest issue for us

Re: ElasticSearch

2011-11-17 Thread Mark Miller
The XML query parser can map to Lucene one to one as well - hasn't seemed to pick up enough steam to be included with Solr yet, but there has been some commotion so it's likely to go in at some point. Not enough demand yet I guess. https://issues.apache.org/jira/browse/SOLR-839 XML Query Parser

Re: optimize with num segments 1 index keeps growing

2011-09-12 Thread Mark Miller
for expungeDeletes here I think: so that its more consistent with the javadocs for optimize? Requests an expunge operation... ? +1 - it's a documentation bug now. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

Re: Search within a sentence (revisited)

2011-07-26 Thread Mark Miller
like I likely should try if I was going to commit this thing. - Mark Miller lucidimagination.com On Jul 26, 2011, at 8:56 AM, Peter Keegan wrote: Thanks Mark! The new patch is working fine with the tests and a few more. If you have particular test cases in mind, I'd be happy to add them

Re: implicit closing of an IndexWriter

2011-07-26 Thread Mark Miller
On Jul 26, 2011, at 9:52 AM, Clemens Wyss wrote: Side note: I am using threads when writing and theses threads are (by design) interrupted (from time to time) Perhaps you are seeing this: https://issues.apache.org/jira/browse/LUCENE-2239 - Mark Miller lucidimagination.com

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
makeSpanTermQuery(String text) { return new SpanTermQuery(new Term(field, text)); } public TermQuery makeTermQuery(String text) { return new TermQuery(new Term(field, text)); } } Peter On Wed, Jul 20, 2011 at 9:22 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 20, 2011, at 7:44 PM, Mark

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
, 2011 at 3:07 PM, Mark Miller markrmil...@gmail.com wrote: Hey Peter, Getting sucked back into Spans... That test should pass now - I uploaded a new patch to https://issues.apache.org/jira/browse/LUCENE-777 Further tests may be needed though. - Mark On Jul 21, 2011, at 9:28 AM

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
I just uploaded a patch for 3X that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change that to an IndexReader I believe. - Mark On Jul 21, 2011, at 4:01 PM, Peter Keegan wrote: Does

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
that I ate was that the word could belong to both it's true sentence, and the one after it. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: Mark Miller's 'SpanWithinQuery' patch seems to have the same issue. If I remember right (It's been more the a couple years), I did index the sentence markers at the same position

Re: Questions on index Writer

2011-07-16 Thread Mark Miller
My advice: Don't close the IndexWriter - just call commit. Don't worry about forcing merges - let them happen as they do when you call commit. If you are going to use the IndexWriter again, you generally do not want to close it. Calling commit is the preferred option. - Mark Miller

[Announce] Lucene-Eurocon Call for Participation Closes Friday, JULY 15

2011-07-12 Thread Mark Miller
EuroCon 2011 is presented by Lucid Imagination, the commercial entity for Apache Solr/Lucene Open Source Search; proceeds of the conference benefit The Apache Software Foundation. Lucene and Apache Solr are trademarks of the Apache Software Foundation. - Mark Miller lucidimagination.com

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Mark Miller
On Jul 8, 2011, at 5:43 AM, Jahangir Anwari wrote: I don't think this is the best solution, am open to other alternatives. Could also make it static public where it is? Either way. - Mark Miller lucidimagination.com

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-07 Thread Mark Miller
); + extractWeightedSpanTerms(terms, new SpanTermQuery(((TermQuery)query).getTerm())); } else if (query instanceof SpanQuery) { extractWeightedSpanTerms(terms, (SpanQuery) query); } else if (query instanceof FilteredQuery) { - Mark Miller lucidimagination.com

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-06 Thread Mark Miller
than 0 for now. Feel free to create a JIRA issue and we can give it's own default greater than 0. - Mark Miller lucidimagination.com On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote: I have a CustomHighlighter that extends the SolrHighlighter and overrides the doHighlighting() method

Re: Difference between regular Highlighter and Fast Vector Highlighter ?

2011-04-11 Thread Mark Miller
if you do. FVH: works with fewer query types and requires that you store term vectors - but scales better than the std Highlighter to very large documents - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org On Apr 1, 2011, at 8:32 AM

Re: NRT consistency

2011-04-11 Thread Mark Miller
. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco

Re: NRT consistency

2011-04-11 Thread Mark Miller
. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org

Re: NRT consistency

2011-04-11 Thread Mark Miller
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Mark Miller markrmil...@gmail.com To: java-user@lucene.apache.org Sent: Mon, April 11, 2011 11:52:05 AM Subject: Re: NRT consistency

[ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-09 Thread Mark Miller
Hey all - apologize for the quick cross post - just to let you know, Andrzej is giving a free webinar this wed. His presentations are always fantastic, so check it out: Lucid Imagination Presents a free technical webinar: Mastering the Lucene Index Wednesday, August 11, 2010 11:00 AM PST / 2:00

Re: NumericField API

2010-06-01 Thread Mark Miller
On 6/1/10 9:34 AM, Mindaugas Žakšauskas wrote: It's just an early observation as historically Lucene has been doing an amazing job in terms of API stability. Yes it has :) Get ready for even more change in that area though :) -- - Mark http://www.lucidimagination.com

[ANN] Lucene/Solr Meetup in NYC on May 11th

2010-05-08 Thread Mark Miller
If you haven't heard, there is a Lucene/Solr meetup in New York next week: http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/calendar/13325754/ The scheduled talks are (in addition to lightening talks): Solr 1.5 and Beyond: Yonik Seeley, author of Solr, co-founder, Lucid Imagination Topics

Re: Batch Indexing - best practice?

2010-03-15 Thread Mark Miller
On 03/15/2010 10:41 AM, Murdoch, Paul wrote: Hi, I'm using Lucene 2.9.2. Currently, when creating my index, I'm calling indexWriter.addDocument(doc) for each Document I want to index. The Documents aren't large and I'm averaging indexing about 500 documents every 90 seconds. I'd like to

Re: Batch Indexing - best practice?

2010-03-15 Thread Mark Miller
)...just to give me an idea of what to shoot for? Paul -Original Message- From: java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org ] On Behalf Of Mark Miller Sent: Monday, March 15, 2010 10:48 AM To: java

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Mark Miller
On 03/04/2010 06:52 PM, Justin wrote: Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index,

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
Hahaha - you have a sly humor. I totally agree though. Features are long overdo, and the committers are lazy. I call for a cancellation of all of their paychecks and a stern warning about slacking off in Lucene land. There are dozens of features that are just taking way to long - whatever

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
Colonel Walter E. Kurtz? Intuitively perhaps people expect the committers to drive the project? When they don't see this are they less likely to contribute? On Thu, Feb 25, 2010 at 10:33 AM, Mark Miller markrmil...@gmail.com wrote: Hahaha - you have a sly humor. I totally agree though. Features

Re: Where to download Mark Miller's Qsol Parser?

2010-02-04 Thread Mark Miller
Chris Harris wrote: The QSol query parser (brief overview here: http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/) used to be available at http://myhardshadow.com/qsol.php (there was documentation as well as a link to a SVN server) but it looks like the

Re: Search for more than one term

2010-01-27 Thread Mark Miller
ctorresl wrote: Hello: IÄm working with Lucene for my thesis, please I need answers to these questions: 1. How can I tell Lucene to search for more than one term??? (for example: the query house garden computer will return documents in which at least one of the term appears) What classes I

Re: Highlighter doesn't highlight wildcard queries after updating to 2.9.1/3.0.0

2009-12-30 Thread Mark Miller
Mohsen Saboorian wrote: After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries like abc*. I thought that it would be because of scoring, so I also set myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching. I tested with both QueryScorer and

Re: Tokenized fields in Lucene 3.0.0

2009-12-15 Thread Mark Miller
Any more info to share? In 2.9, Tokenized literally == Analyzed. /** @deprecated this has been renamed to {...@link #ANALYZED} */ public static final Index TOKENIZED = ANALYZED; Michel Nadeau wrote: Hi, I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed all

Re: org.apache.lucene.search.RemoteSearchable missing

2009-12-08 Thread Mark Miller
Weiwei Wang wrote: Hi,all, I can't not find this class in the downloaded jar and I can't figure out what's wrong. Does anybody here know how to fix it? Its now in the remote Contrib. -- - Mark http://www.lucidimagination.com

Re: NearSpansUnordered payloads

2009-11-25 Thread Mark Miller
Grant Ingersoll wrote: On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote: I'm interested in getting the payload information from the matching span, however it's unclear from the javadocs why NearSpansUnordered is different than NearSpansOrdered in this regard. NearSpansUnordered

Re: SpanQuery for Terms at same position

2009-11-23 Thread Mark Miller
Your trying -1 with ordered right? Try it with non ordered. Christopher Tignor wrote: A slop of -1 doesn't work either. I get no results returned. this would be a *really* helpful feature for me if someone might suggest an implementation as I would really like to be able to do arbitrary span

Re: Lucene Java 3.0.0 RC1 now available for testing

2009-11-17 Thread Mark Miller
Here is some of the history: https://issues.apache.org/jira/browse/LUCENE-652 https://issues.apache.org/jira/browse/LUCENE-1960 Glen Newton wrote: Could someone send me where the rationale for the removal of COMPRESSED fields is? I've looked at

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
The build/release formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again in the future. Some discussion about it came up recently on the list. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: OK. I just downloaded the 2.9.0

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Miller
Marvin Humphrey wrote: On Mon, Nov 09, 2009 at 04:07:55PM -0500, Robert Muir wrote: Mark, I think my concern is that Sen itself is LGPL ( https://sen.dev.java.net/). this lucene-ja is just a lucene interface to this LGPL library. I think this dependency might be a problem, but I am not

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
an official release might warrant a '-dev' version, I suppose. (just my 2 cents.) Peter On Mon, Nov 9, 2009 at 7:57 PM, Mark Miller markrmil...@gmail.com wrote: The build/release formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
is inconsistent with 2.9.1. I guess that's the flux you referred to. Peter On Mon, Nov 9, 2009 at 8:13 PM, Mark Miller markrmil...@gmail.com wrote: Yeah - its a debatable point. You can have issues when building though - did you build with java 1.5? Then its not like the official build. This keeps

Re: ComplexPhraseQueryParser highlight problem

2009-11-03 Thread Mark Miller
AHMET ARSLAN wrote: Looks like its because the query coming in is a ComplexPhraseQuery and the Highlighter doesn't current know how to handle that type. It would need to be rewritten first barring the special handling it needs - but unfortunately, that will break multi-term query

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Yes - please share your test programs and I can investigate (ApacheCon this week, so I'm not sure when). And its best to keep communications on the list - that allows others with similar issues (now or in the future) to benefit from whatever goes on. You will also reach a wider pool of people

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Looks like its because the query coming in is a ComplexPhraseQuery and the Highlighter doesn't current know how to handle that type. It would need to be rewritten first barring the special handling it needs - but unfortunately, that will break multi-term query highlighting unless you use boolean

Re: IO exception during merge/optimize

2009-10-29 Thread Mark Miller
Any chance I could get you to try that again with a buffer of like 800MB to a gig and do a comparison? I've been investigating the returns you get with a larger buffer size. It appears to be pretty diminishing returns over 100MB or so - at higher than that, I've gotten both slower speeds for some

Re: IO exception during merge/optimize

2009-10-29 Thread Mark Miller
about 5 min. shorter because of some non-Lucene related delays after the last document. Peter On Thu, Oct 29, 2009 at 4:30 PM, Mark Miller markrmil...@gmail.com wrote: Any chance I could get you to try that again with a buffer of like 800MB to a gig and do a comparison? I've been

[ANN] New Technical White Paper on Apache Lucene 2.9 from Lucid Imagination

2009-10-28 Thread Mark Miller
With the recent release of Apache Lucene 2.9, Lucid Imagination has put together an in-depth technical white paper on the range of performance improvements and new features (per segment indexing, trierange numeric analysis, and more), along with recommendations for upgrading your Lucene

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Mark Miller
Luis Alves wrote: Mark Miller wrote: Mark Miller wrote: Michael Busch wrote: Why will just saying once again Hey, let's just release more often work now if it hasn't in the last two years? Mich I don't know that we need to release more often to take advantage

Re: 2.9 per segment searching/caching

2009-10-22 Thread Mark Miller
Bill Au wrote: Since Lucene 2.9 has per segment searching/caching, does query performance degrade less than before (2.9) as more segments are added to the index? Bill I think non sorting cases are actually faster now over multiple segments - though you will still see performance degrade

Re: How to loop through all the entries for a field

2009-10-22 Thread Mark Miller
But with Lucene 2.9 you would want to use StringHelper.intern right? adviner wrote: Thank you Uwe Schindler wrote: Use this one: String fieldname=BookTitle; fieldname = fieldname.intern(); // because of this we need no String.equals() TermEnum te = IndexReader.terms(new

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Jukka Zitting wrote: Hi, On Fri, Oct 16, 2009 at 10:23 AM, Danil ŢORIN torin...@gmail.com wrote: What about creating major version more often? +1 We're not going to run out of version numbers, so I don't see a reason not to upgrade the major version number when making

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Steven A Rowe wrote: On 10/16/2009 at 2:58 AM, Michael Busch wrote: B) best effort drop-in back compatibility for the next minor version number only, and deprecations may be removed after one minor release (e.g. v3.3 will be compat with v3.2, but not v3.4) This is only true on a

Re: Difference between 2.4.1 and 2.9.0 (possible regression?)

2009-10-16 Thread Mark Miller
It was a bug and Mike fixed it. The bug was that exact matches where not being returned as you state. Will be fixed in 2.9.1. stefcl wrote: Thanks, Even if you add to the example a document called giga, I'm not sure that searching giga~0.8 would return anything. It seems a bit weird

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Michael Busch wrote: Why will just saying once again Hey, let's just release more often work now if it hasn't in the last two years? Mich I don't know that we need to release more often to take advantage of major numbers. 2.2 was released in 07 - we could have just released 2.9 right after

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Mark Miller wrote: Michael Busch wrote: Why will just saying once again Hey, let's just release more often work now if it hasn't in the last two years? Mich I don't know that we need to release more often to take advantage of major numbers. 2.2 was released in 07 - we could

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-08 Thread Mark Miller
though. Thanks, Chris On Wed, Oct 7, 2009 at 4:02 PM, Mark Miller markrmil...@gmail.com wrote: Solr just copies them into the same directory - Lucene files are write once, so its not much different than what happens locally. Nigel wrote: Right now we logically re-open an index

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-07 Thread Mark Miller
the old one. We don't use IndexReader.reopen() because the updated index is in a different directory (as opposed to being updated in-place). (Reading about some of the 2.9 changes motivated me to look into actually using reopen(). And Michael Busch and Mark Miller both pointed out that I

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Mark Miller
I keep considering a full response too this, but I just can't get over the hump and spend the time writing something up. Figured someone else would get to it - perhaps they still will. I will make a comment here though: Before Lucene 2.9, I don't think this made any difference, as (I think) the

Re: Error using multireader searcher in Lucene 2.9

2009-10-02 Thread Mark Miller
Sorry Raf - technically your not allowed to use internal Lucene id's that way. It happened to work in the past if you didn't use MultiSearcher, but its not promised by the API, and no longer works as you'd expect in 2.9. You have to figure out another approach that doesn't use the internal ids

Re: TimeLimitedCollector hang on, VM process doesn't die (TOMCAT)

2009-10-02 Thread Mark Miller
That thread will only be stopped if its interrupted. So it would appear there is a not a path that leads to it being interrupted ... why that is would be the next question ... -- - Mark http://www.lucidimagination.com Mani EZZAT wrote: Hello everyone. I'm using solrJ for an application

Re: TimeLimitedCollector hang on, VM process doesn't die (TOMCAT)

2009-10-02 Thread Mark Miller
Mani EZZAT wrote: Mark Miller wrote: That thread will only be stopped if its interrupted. So it would appear there is a not a path that leads to it being interrupted ... why that is would be the next question ... I found someone (a japanese) who had the same problem http

Re: Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Mark Miller
Per segment over many segments is actually a bit faster for none sort cases and many sort cases -but an optimized index will still be fastest - the speed benifit of many segments comes when reopening - so say for realtime search - in that case you may want to sac the opt perf for a segment

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
PM, Mark Miller markrmil...@gmail.com wrote: Felipe Lobo wrote: Hi, i updated my lucene lib to 2.9.0 and i'm trying to instanciate the spanscorer but the constructor is protected. I looked in the javadoc of lucene and saw 2 subclasses of it (PayloadNearQuery.PayloadNearSpanScorer

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
that in the package)?? Sorry! Thats what happens when I trust my memory ;) Its QueryTermScorer. Thanks. On Thu, Oct 1, 2009 at 10:44 AM, Mark Miller markrmil...@gmail.com wrote: Felipe Lobo wrote: Hi, thanks for the answer but it didn't work. I stopped rewriting the query

Re: TSDC, TopFieldCollector co

2009-09-30 Thread Mark Miller
If you want relevance sorting (Sort.Score not Sort.Relevance right?), I'd think you want to use TopScoreDocCollector, not TopFieldCollector. The only reason to use relevance with TopFieldCollector is if you you are doing a nth sort with a field sort as well. You don't really need to worry about

Re: TopDocCollector limits

2009-09-30 Thread Mark Miller
? On Tue, Sep 29, 2009 at 7:40 PM, Mark Miller markrmil...@gmail.com wrote: Max Lynch wrote: Hi, I am developing a search system that doesn't do pagination (searches are run in the background and machine analyzed). However, TopDocCollector makes me put

Re: Implement SpanScorer on 2.9 lucene lib!

2009-09-30 Thread Mark Miller
Felipe Lobo wrote: Hi, i updated my lucene lib to 2.9.0 and i'm trying to instanciate the spanscorer but the constructor is protected. I looked in the javadoc of lucene and saw 2 subclasses of it (PayloadNearQuery.PayloadNearSpanScorer,

Re: Highlighting phrases in 2.9

2009-09-30 Thread Mark Miller
Scott Smith wrote: I've been looking at the changes I have to make in my code to go from 2.4.1 to 2.9. One of the features I have is to highlight query hits in documents which meet the search criteria. If the query has a phrase, then I need to highlight the phrase, but not isolated words

Re: TopDocCollector limits

2009-09-29 Thread Mark Miller
Max Lynch wrote: Hi, I am developing a search system that doesn't do pagination (searches are run in the background and machine analyzed). However, TopDocCollector makes me put a limit on how many results I want back. For my system, each result found is important. How can I make it collect

Re: PrefixQuery vs wildcardquery

2009-09-28 Thread Mark Miller
John Seer wrote: Hello, Is there any benefit of using one or other for start with query? Which one is faster? Regards Prefix query is a bit more efficient - not sure what it turns into realworld, but prefix just checks if the term's start with the prefix - wildcard has a bit more

Re: PrefixQuery vs wildcardquery

2009-09-28 Thread Mark Miller
Though in 2.9 this is not much of a concern - the multi term queries are smart - if it matches few enough terms it will rewrite to a constant score booleanquery - if it matches a lot of terms it will rewrite to a constantscore query - using a filter underneath. So maxclause issues should

The Release of Lucene 2.9

2009-09-25 Thread Mark Miller
Release: The next release will be Lucene 3.0. This should come along shortly, and will remove all of the deprecated code in Lucene 2.9. Lucene 3.0 will also be the first release to move from Java 1.4 to Java 1.5 as a requirement. Thanks, Mark Miller -BEGIN PGP SIGNATURE- Version: GnuPG

Re: Getting Payload data from BooleanQuery results

2009-09-24 Thread Mark Miller
I should beef up that spans extractor - it can actually work on the constantscore multi term queries (the base ones that now have a constant score mode in 2.9), just like the Highlighter does. That class really belongs in contrib probably. You can use the filter and the spanquery to get the

Lucene 2.9 RC5 now available for testing

2009-09-19 Thread Mark Miller
/CONTRIB-CHANGES.txt Download release candidate 5 here: http://people.apache.org/~markrmiller/staging-area/lucene2.9rc5/ Be sure to report back with any issues you find! Thanks, Mark Miller -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http

Re: What would be the fastest BooleanQuery possible?

2009-09-16 Thread Mark Miller
With the new Collector API in Lucene 2.9, you no longer have to compute the score. Now a Collector is passed a Scorer if they want to use it, but you can just ignore it. -- - Mark http://www.lucidimagination.com Benjamin Pasero wrote: Hi, I am using Lucene not only for smart fulltext

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
are faster than NIOFS and the response times improved. But it's still slower than 2.4. I'll do some profiling now again and let you know the results. Thanks again for all the great support to all who've answered. Mark Miller wrote: Can you run the following test on your RAMDISK? http

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
and voting or proceed with the release of 2.9? We waited so long and for most people it is faster than slower! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
/lucene_29_newapi_mmap_singlereq.png Have to verify that the last one is not by accident more than one request. Will do the run again and then post the required info. Mark Miller wrote: bq. I'll do some profiling now again and let you know the results. Great - it will be interesting to see

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Ah - that explains a bit. Though if you divide by 2, the new one still appears to overcall each method in comparison to 2.4. - Mark Uwe Schindler wrote: http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png Have to verify that the last one is not by accident more than one request.

Re: What would be the fastest BooleanQuery possible?

2009-09-16 Thread Mark Miller
it for now). Anything in that version that could speed things up? On Wed, Sep 16, 2009 at 6:48 PM, Mark Miller markrmil...@gmail.com wrote: With the new Collector API in Lucene 2.9, you no longer have to compute the score. Now a Collector is passed a Scorer if they want to use

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Something is very odd about this if they both cover the same search and the environ for both is identical. Even if one search was done twice, and we divide the numbers for the new api by 2 - its still *very* odd. With 2.4, ScorerDocQueue.topDoc is called half a million times. With 2.9, its called

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Notice that while DisjunctionScorer.advance and DisjuntionScorer.advanceAfterCurrent appear to be called in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent called. Can someone explain that? Mark Miller wrote: Something is very odd about this if they both cover the same search

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
That just doesn't jive. Mark Miller wrote: Notice that while DisjunctionScorer.advance and DisjuntionScorer.advanceAfterCurrent appear to be called in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent called. Can someone explain that? Mark Miller wrote: Something

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
the CachingWrapperFilter and QueryWrapperFilter, I think it explains this behaviour (and Thomas ran some warming queries before). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Hey Thomas - any chance you can do some quick profiling and grab the hotspots from the 3 configurations? Are your custom sorts doing anything tricky? -- - Mark http://www.lucidimagination.com Thomas Becker wrote: Urm and uploaded here: http://ankeschwarzer.de/tmp/graph.jpg Sorry.

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Thomas Becker wrote: Hey Mark, thanks for your reply. Will do. Results will follow in a couple of minutes. Thanks, awesome. Also, how many segments (approx) are in your index? If there are a lot, have you/can you try the same tests on an optimized index? Don't want to get ahead of the

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
after each update. So this profiling is on an optimized index (eg a single segment) ? That would be odd indeed, and possibly point to some of the scoring changes. Mark Miller wrote: Thomas Becker wrote: Hey Mark, thanks for your reply. Will do. Results will follow

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
in with some ideas as well. Do confirm that those profiling results are on a single segment though. - Mark Mark Miller wrote: Thomas Becker wrote: Here's the results of profiling 10 different search requests: http://ankeschwarzer.de/tmp/lucene_24_oldapi.png http://ankeschwarzer.de/tmp

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
performance trap. -- - Mark http://www.lucidimagination.com Thanks a lot for your support! Cheers, Thomas Mark Miller wrote: A few quick notes - Lucene 2.9 old api doesn't appear much worse than Lucene 2.4? You save a lot with the new Intern impl, because thats not a hotspot

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Can you run the following test on your RAMDISK? http://people.apache.org/~markrmiller/FileReadTest.java I've taken it from the following issue (in which NIOFSDirectory was developed): https://issues.apache.org/jira/browse/LUCENE-753 -- - Mark http://www.lucidimagination.com

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, September 15, 2009 5:30 PM To: java-user@lucene.apache.org Subject: Re: lucene 2.9.0RC4 slower than 2.4.1? Thomas Becker wrote: Hey Mark, yes. I'm running the app on unix. You see

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Mark Miller wrote: Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with SeparateFile all 4 of my cores are immediately pinned and remain so. With ChannelFile, all 4 cores hover 20-30%. It would appear it may not be a good idea to use NIOFSDirectory on ramdisks. Even still

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, September 15, 2009 7:15 PM To: java-user@lucene.apache.org Subject: Re: lucene 2.9.0RC4 slower than 2.4.1? Mark Miller wrote: Indeed - I just ran

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
test file? -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller markrmil...@gmail.com wrote: The results: config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024 poolsize=2 filelen=730554368 answer=-282295611, ms=173550, MB/sec

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
that benchmarker... is it OK that the answer is different? Did you use the same test file? -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller markrmil...@gmail.com wrote: The results: config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
=2 filelen=730554368 answer=-282295361, ms=766340, MB/sec=381.3212767179059 Mark Miller wrote: Michael McCandless wrote: I don't like that the answer is different... but it's really really odd that it's different-yet-almost-the-same. Mark, were these 4 results on a normal (ext4

  1   2   3   4   5   6   7   >