Re: updating jakarta site

2005-03-01 Thread Doug Cutting
Henri Yandell wrote: Redirect of jakarta.apache.org/lucene to lucene.apache.org/java/docs/index.html I noticed there's a commented out redirect in the .htaccess, so after adding my own I deleted it again and left the redirect off for the moment. Unsure if there's a reason the commented out bit is

Re: updating jakarta site

2005-03-01 Thread Doug Cutting
Erik Hatcher wrote: When Doug is cool with re-enabling the redirect, it's fine with me. I'm cool with it if it works. Why not re-enable it, search for site:apache.org lucene on Google, Yahoo! and MSN, and click on the first few links. If these work, then I'm okay with the redirect. As we

Re: updating jakarta site

2005-02-28 Thread Doug Cutting
Henri Yandell wrote: Your download page is already separate, you're using the global closer.cgi file. So we need to: - rename Lucene Java's mailing lists, with forwards put into place. - add a mailing list page to Lucene Java's website, modelled after

Re: updating jakarta site

2005-02-28 Thread Doug Cutting
Garrett Rooney wrote: Actually, currently we've got both lucene4c and java commits going to [EMAIL PROTECTED], and there was some talk of just leaving it that way, since it isn't that much traffic and it encourages people to keep an eye on what's going on in other languages. I think that's a

Re: patch - DEFAULT_ vars in IndexWriter non-final and DEFAULT for useCompoundFile

2005-02-28 Thread Doug Cutting
Kevin A. Burton wrote: Wolf Siberski wrote: Kevin A. Burton wrote: I see following issues with your patch: - you changed the DEFAULT_... semantics from constant to modifiable, but didn't adjust the names according to Java conventions (default_...). Java doesn't have any naming conventions

Re: patch - DEFAULT_ vars in IndexWriter non-final and DEFAULT for useCompoundFile

2005-02-28 Thread Doug Cutting
Kevin A. Burton wrote: Doug Cutting wrote: Wolf Siberski wrote: So, if anything at all, I would rather opt for making these constants private :-). I agree. In general, fields should either be final, or private with accessor methods. So, we could change this to: private static int

read index terms lazily

2005-02-25 Thread Doug Cutting
Attached is a patch which delays reading of index terms until it is first accessed. The cost of this is another file descriptor, until the terms are accessed, when it is closed. The benefit is that operations that do not require access to index terms are much faster and use much less memory.

Re: Javadoc not available due to non-public classes?

2005-02-24 Thread Doug Cutting
Kevin A. Burton wrote: You know ... the javadoc on the site doesn't include non-public classes like TermInfosWriter. Confused me for a second. That's because it's not public. The javadoc on the site is to document the public api. This is not a bug, but a feature. Also.. the site doesn't

Re: Patch - IndexReader methods and MultiSearcher methods...

2005-02-24 Thread Doug Cutting
Kevin A. Burton wrote: Also, I assume that the reason you make the reader field protected is because getReader() is not sufficient, i.e., you want to set the reader. This would stylistically be better done with a setReader() method, no? Do you only change it at construction, or at runtime?

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-02-22 Thread Doug Cutting
Wolf Siberski wrote: The price is an extension (or modification) of the Searchable interface. I've added corresponding search(Weight...) methods to the existing search(Query...) methods and deprecated the latter. I think this is the right solution. If Searchable is meant to be Lucene internal,

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-02-21 Thread Doug Cutting
Wolf Siberski wrote: Now I found another solution which requires more changes, but IMHO is much cleaner: - when a query computes its Weight, it caches it in an attribute - a query can be 'frozen'. A frozen query always returns the cached Weight when calling Query.weight(). Orignally there was no

Re: Into javadocs? [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq()

2005-02-21 Thread Doug Cutting
Paul Elschot wrote: Would you mind if some pieces of your reply end up in the javadocs? Not at all. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Incubate lucene4c?

2005-02-17 Thread Doug Cutting
+1 Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] Re: Incubating Lucene.Net

2005-02-17 Thread Doug Cutting
+1 Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Incubating Lucene.Net

2005-02-17 Thread Doug Cutting
George Aroush wrote: Any thoughts on Lucene.Net/dotLucene package name are welcome. I agree that Lucene.Net is a better name. It's more consistent with Lucene Java and Lucene4c, the names for other ports of Lucene. I think it's okay to reclaim the name of an abandonded project, especially if

Re: removing the old FAQ

2005-02-16 Thread Doug Cutting
Daniel Naber wrote: could someone (Doug?) make me an administrator for the old Lucene project at sourceforge? Done. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene.apache.org

2005-02-15 Thread Doug Cutting
Henri Yandell wrote: On names, Lucene Java might hit trademark issues I guess. So potential worry there. Good point. Although I note that Apache already has projects called Xerces Java and Xalan Java. Sun says: http://www.sun.com/policies/trademarks/#20c So, technically, the fullname of the

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Erik Hatcher wrote: Doug - do you have your Forest work handy? Or has anyone else stepped up to build the web site? I don't have anything reusable. I converted Nutch from a different (not Anakia) XML-based site to Forrest with little difficulty (mostly using string replace in Emacs). I

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Erik Hatcher wrote: I have checked out our current site to the lucene.apache.org area, and I've also set up a redirect from the jakarta.apache.org/lucene area. Keep in mind, there are two projects here: 1. Porting Java Lucene's site to Forrest. This should be structured as a sub-project of

Re: [ANNOUNCE] lucene4c 0.02

2005-02-14 Thread Doug Cutting
Garrett Rooney wrote: Additionally it would be good to work on updating the disk format documentation, I've found several cases where the docs are quite out of date compared to the current code. It's hard to expect the various different ports to maintain compatibility when the formats are only

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Garrett Rooney wrote: Agreed. Java Lucene is a subproject of the Lucene TLP, leaving the existing Java Lucene site there for the time being seems ok, just so we have something there, but we should endeavour to put up something more permanent ASAP. I think, for the present,

Re: What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Doug Cutting
Erik Hatcher wrote: I'm really at the limit of my bandwidth - I've got the sandbox restructuring effort on my plate right now and would like it if someone could pick up the ball on the web site side of things. Then perhaps you shouldn't have redirected everything to lucene.apache.org... We

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Erik Hatcher wrote: It also might be a good time to think about mailing list names. There was a request on infrastructure@ to move [EMAIL PROTECTED] to [EMAIL PROTECTED], would it make more sense to move it to [EMAIL PROTECTED] NOW you tell me :) I think until we have these elusive other

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Doug Cutting wrote: And we also want to try not to break URLs when we move things. For this reason it's best to move things as few tims as possible, so that we don't end up with a confusing set of redirects. More to the point, we also want to try not to break email addresses. So the fewer

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Bernhard Messer wrote: Doug, you placed a copy of the website in the java directory. In both, the original and the java directory the api directory is missing. I can't copy it into because of the access rights :-( Argh. The group protection is 'lucene', as it should be, but you're not in

Re: lucene.apache.org

2005-02-14 Thread Doug Cutting
Erik Hatcher wrote: I've amended my request for e-mail lists here with Doug's preference: http://issues.apache.org/jira/browse/INFRA-195 Do others agree this is the best approach? I don't mean to be autocratic. Do we imagine different pools of users and developers for different Lucene

Re: Transactional Directories

2005-02-14 Thread Doug Cutting
Oscar Picasso wrote: Hi, I am currently implementing a Directory backed by a Berkeley DB that I am willing to release as an open source project. Besides the internal implementation, it differs from the one in the sandbox in that it is implemented with the Berkeley DB Java Edition. Using the Java

Re: Transactional Directories

2005-02-14 Thread Doug Cutting
[ Please ignore my previous message. I somehow hit Send before typing anything! ] Oscar Picasso wrote: However with a relatively high number of random insertions, the cost of the new IndexWriter / index.close() performed for each insertion is two high. Did you measure that? How much slower was

Re: Study Group (WAS Re: Normalized Scoring)

2005-02-07 Thread Doug Cutting
Paul Elschot wrote: I learned a lot by adding some javadocs to such classes. I suppose Doug added the Expert markings, but I don't know their precise purpose. The Expert declaration is meant to indicate that most users should not need to understand the feature. Lucene's API seeks to be both

Re: whither sandbox

2005-02-04 Thread Doug Cutting
Erik Hatcher wrote: Also, we should package a lucene-XX-all.zip/.tar.gz that includes all the contrib pieces also allowing someone to simply download Lucene and all the packaged contrib pieces at once. I'll go further: that should be the only download. We should avoid having a bunch of

Re: [PROPOSAL] Lucene to search.apache.org

2005-02-02 Thread Doug Cutting
Erik Hatcher wrote: Hmmm good point. I hadn't considered access control. A migration will be performed later today, and I think it will initially be a test migration for me to verify. I'll double-check with Justin, who's doing the conversion, on how access control will be initially

Re: Fwd: [PROPOSAL] Lucene to search.apache.org

2005-02-01 Thread Doug Cutting
Erik Hatcher wrote: The decision was a bit slow to get out, but Lucene has been approved for TLP. Thanks for pushing this through! I propose we simply import our two CVS repositories in with all of jakarata-lucene as the root of the repository and jakarta-lucene-sandbox under sandbox in the

Re: [PROPOSAL] Lucene to search.apache.org

2005-02-01 Thread Doug Cutting
Erik Hatcher wrote: On Feb 1, 2005, at 3:13 PM, Doug Cutting wrote: I think we want Java Lucene to be a sub-project of Lucene. So the repository should be something like: https://svn.apache.org/repos/asf/lucene/java I already put in the request for this initial svn structure: /asf/lucene

Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-31 Thread Doug Cutting
Doug Cutting wrote: It would translate a query t1 t2 given fields f1 and f2 into something like: +(f1:t1^b1 f2:t1^b2) +(f2:t1^b1 f2:t2^b2) Oops. The first term on that line should be f1:t2, not f2:t1: +(f1:t2^b1 f2:t2^b2) f1:t1 t2~s1^b3 f2:t1 t2~s2^b4 Doug

Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-31 Thread Doug Cutting
Chuck Williams wrote: That expansion is scalable, but it only accounts for proximity of all query terms together. E.g., it does not favor a match where t1 and t2 are close together while t3 is distant over a match where all 3 terms are distant. Worse, it would not favor a match with t1 and t2 in

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Doug Cutting
Christoph Goller wrote: The similarity specified for the search has to be modified so that both idf(...) AND queryNorm(...) always return 1 and as you say everything except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts of the rewritten query. coord/tf/sloppyFreq computation

Re: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Doug Cutting
Maybe we should just call it lucene.apache.org, and move the current Lucene project to lucene.apache.org/java? The other projects we imagine adding (Nutch, DotLucene, CLucene, etc.) are all Lucene-related, no? Lucene has a pretty good brand name... Doug Otis Gospodnetic wrote: ir.apache.org

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-14 Thread Doug Cutting
Wolf Siberski wrote: Doug Cutting wrote: So, when a query is executed on a MultiSearcher of RemoteSearchables, the following remote calls are made: 1. RemoteSearchable.rewrite(Query) is called After that step, are wildcards replaced by term lists? Yes. I haven't taken a look at the rewrite

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-14 Thread Doug Cutting
Chuck Williams wrote: Doug Cutting wrote: It would indeed be nice to be able to short-circuit rewriting for queries where it is a no-op. Do you have a proposal for how this could be done? First, this gets into the other part of Bug 31841. I don't believe MultiSearcher.rewrite() is ever

Re: JDK code in the codebase

2005-01-14 Thread Doug Cutting
Erik Hatcher wrote: The questions still remain, though, and lawyers do want to know the answers: - How did JDK code get into Lucene's codebase to begin with? I put it there in a moment of ignorance way back as a hack in order to make things run in an older version of the JVM.

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-12 Thread Doug Cutting
Chuck Williams wrote: I was thinking of the aggressive version with an index-time solution, although I don't know the Lucene architecture for distributed indexing and searching well enough to formulate the idea precisely. Conceptually, I'd like each server that owns a slice of the index in a

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-12 Thread Doug Cutting
Chuck Williams wrote: There needs to be a way to create the aggregate docFreq table and keep it current under incremental changes to the indices on the various remote nodes. I think you're getting ahead of yourself. Searchers are based on IndexReaders, and hence doFreqs don't change until a new

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-12 Thread Doug Cutting
Wolf Siberski wrote: Chuck Williams wrote: This is a nice solution! By having MultiSearcher create the Weight, it can pass itself in as the searcher, thereby allowing the correct docFreq() method to be called. This is similar to what I tried to do with topmostSearcher, but a much better way to

Re: what if the IndexReader crashes, after delete, before close.

2005-01-11 Thread Doug Cutting
Sigh. This stuff would get a lot simpler if we were able to use Java 1.4's FileLock. Then locks would be automatically cleared by the OS if the JVM crashes. Should we upgrade the JVM requirements to 1.4 for Lucene's 1.9/2.0 releases and update the locking code? Doug Luke Shannon wrote: Here

Re: what if the IndexReader crashes, after delete, before close.

2005-01-11 Thread Doug Cutting
Terry Steichen wrote: Would it be possible to optimize the operation to use 1.4 runtime features but retain the option, if desired to run in a legacy (1.3) environment, perhaps in a degraded mode? Lucene 1.4.3 is a degraded mode, no? There are still back-compatibility issues. To be safe,

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-11 Thread Doug Cutting
Chuck Williams wrote: As Wolf does, I hope a committer with deep knowledge of Lucene's design in this area will weigh in on the issue and help to resolve it. The root of the bug is in MultiSearcher.search(). This should construct a Weight, weight the query, then score the now-weighted query.

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-11 Thread Doug Cutting
Chuck Williams wrote: This is a nice solution! By having MultiSearcher create the Weight, it can pass itself in as the searcher, thereby allowing the correct docFreq() method to be called. Glad to hear it at least makes sense... Now I hope it works! I'm still left wondering if having

Re: auto-filters?

2005-01-03 Thread Doug Cutting
markharw00d wrote: If we intend to make more use of filters this may be an appropriate time to raise a general question I have on their use. Is there a danger in tieing them to a specific implementation (java.util.BitSet)? I do not object in principal to replacing BitSet with an interface,

Re: CFS file and file formats

2005-01-03 Thread Doug Cutting
Bernhard Messer wrote: Why not implementing a small utility class, f.e CompoundFileUtil.java within the org.apache.lucene.index Package ? This class could be public and implement the necessary functionality. This is what i would prefer, because we don't have to change the visibility of

auto-filters?

2005-01-02 Thread Doug Cutting
Filters are more efficient than query terms for many things. For example, a RangeFilter is usually more efficient than a RangeQuery and has no risk of triggering BooleanQuery.TooManyClauses. And Filter caching (e.g., with CachingWrapperFilter) can make otherwise expensive clauses almost free,

Re: DefaultSimilarity 2.0?

2004-12-20 Thread Doug Cutting
Chuck Williams wrote: Finally, I'd suggest picking content that has multiple fields and allow the individual implementations to decide how to search these fields -- just title and body would be enough. I would like to use my MaxDisjunctionQuery and see how it compares to other approaches (e.g.,

Re: Migration to SVN?

2004-12-20 Thread Doug Cutting
Garrett Rooney wrote: The least effort way of doing that would be to include both the core and sandbox under the same trunk, but again, that implies that you ALWAYS tag and branch them together, and sometimes you may not want to do that. I think we should always branch these together. To my

DefaultSimilarity 2.0?

2004-12-17 Thread Doug Cutting
Chuck Williams wrote: Another issue will likely be the tf() and idf() computations. I have a similar desired relevance ranking and was not getting what I wanted due to the idf() term dominating the score. [ ... ] Chuck has made a series of criticisms of the DefaultSimilarity implementation.

Re: Explanations and overridden similarity

2004-12-16 Thread Doug Cutting
Dan Climan wrote: Shouldn't the call to Similarity.decodeNorm be replaced with a call to Similarity.getDefault().decodeNorm decodeNorm is a static method. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,

Re: potential new Lucene logo

2004-12-13 Thread Doug Cutting
Murray Altheim wrote: I thought I'd have a go at the Lucene logo, not to change it markedly but clean it up so that it is based on an existing font. This potential Lucene logo is based on an ITC font called Magneto Bold Extended, which you can see here: http://www.identifont.com/show?72W I

Re: setLowercaseWildcardTerms and FuzzyQueries

2004-12-13 Thread Doug Cutting
Daniel Naber wrote: I'm aware that the Wildcard name won't fit well anymore, suggestions for a better name are welcome. Expanded? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boolean Scorer

2004-12-10 Thread Doug Cutting
Christoph Goller wrote: I think we should change BooleanScorer. An easy way would be to sort the bucket list before it is used. Do you think that would affect performance dramatically? I think it would make it slower. Otherwise we should reimplement BooleanScorer. I haven't looked into the

Re: Release 1.4.3

2004-12-06 Thread Doug Cutting
Christoph Goller wrote: Doug, could you please move api/ to api.old/ and api.new/ to api/ Done. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Release 1.4.3

2004-11-26 Thread Doug Cutting
Christoph Goller wrote: I think i should finally make Release 1.4.3. Great! I presume the default.properties does no longer exist. I just fill in 1.4.3 as version in the build.xml before building it. Is this ok? I build releases with something like: ant -Dversion=1.4.3 clean dist So that it

Re: GIS

2004-11-16 Thread Doug Cutting
Guillermo Payet wrote: The fact that Lucene stores and indexes (or seems it seems) all terms as Strings and that there is no NumericTerm makes me think that I might be missing something and that this migh be a much bigger deal than I think? You could write a HitCollector that uses

Re: FuzzyQuery prefix length

2004-10-26 Thread Doug Cutting
Erik Hatcher wrote: On Oct 20, 2004, at 12:14 PM, Doug Cutting wrote: The advantages of a zero-character prefix default are that it's back-compatibile and that it will find more matches, when spelling differences are in the first characters. I prefer this default. Anyone using QueryParser needs

Re: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2004-10-21 Thread Doug Cutting
Chuck Williams wrote: However, I'm not sure this analysis is completely correct due to MultiSearcher.docFreq() which appears to be trying to redefine the tf's to be the global value across all indices. It wasn't clear to me how this code is ever reached, e.g. from TermQuery -- SegmentTermDocs.

Re: Retrieving Document Boosts

2004-10-20 Thread Doug Cutting
Dan Climan wrote: TermEnum terms = ir.terms(); int numTerms = 0; while (terms.next()) { Term t = terms.term(); if (t.field().equals(FullText)) numTerms++; }

Re: lucene and large (2GB+) indexes using RAMDirectory

2004-10-18 Thread Doug Cutting
Jonathan Hager wrote: Nate Denning encountered the following error when trying to load a large (greater than 2147483647 bytes) index into a RAMDirectory. The server has 12GB of memory, so loading it into memory should not be a problem. Have you instead tried copying the index to a ramfs ('mount

Re: API cleanup for Field and future cleanup for IndexReader

2004-10-18 Thread Doug Cutting
Bernhard Messer wrote: Christoph Goller wrote: Bernhard Messer wrote: Currently there are 3 different methods available to get the field names from an index. a) getFieldNames(); b) getFieldNames(boolean indexed); c) getIndexedFieldNames(boolean storedTermVector); my proposal is to deprecate a),

Re: idf and explain(), was Re: Search and Scoring

2004-10-18 Thread Doug Cutting
Chuck Williams wrote: That's a good point on how the standard vector space inner product similarity measure does imply that the idf is squared relative to the document tf. Even having been aware of this formula for a long time, this particular implication never occurred to me. Do you know if

Re: Propose Bernhard as committer

2004-10-18 Thread Doug Cutting
+1 Christoph Goller wrote: I would like to propose Bernhard as Lucene committer. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: FuzzyQuery prefix length

2004-10-18 Thread Doug Cutting
Daniel Naber wrote: On Tuesday 12 October 2004 17:22, Doug Cutting wrote: Which is worse: a person who searches for Photokopie~ in a 1000 document collection does not find documents containing Fotokopie; or a person who searches for Photokopie~ in a 1M document collection doesn't find anything

Re: FuzzyQuery prefix length

2004-10-18 Thread Doug Cutting
Daniel Naber wrote: Searching for Photokopie~ on a 230,000 document corpus takes 2.3 seconds here (AMD Athlon 2600+; other fuzzy terms get similar performance). As the number of terms doesn't increase so fast with more documents, it will not take 10 seconds for 1 million documents. So fuzzy

Re: What's the purpose of hashing docid in BooleanScorer

2004-10-18 Thread Doug Cutting
Christoph Goller wrote: With the current scorer API one could get rid of buckettable and advance all subscores only by one document each time. I am not sure whether the bucketable implementation is really much more efficient. I only see the advantage of inlining some of the scorer.next and

Re: What's the purpose of hashing docid in BooleanScorer; DisjunctionScorer

2004-10-18 Thread Doug Cutting
Paul Elschot wrote: I have a DisjunctionScorer based on a PriorityQueue lying around, but I can't benchmark it myself at the moment. In case there is interest, I'll gladly adapt it to org.apache.lucene.search and add it in bugzilla. This should look a lot like SpanOrQuery.getSpans(). On a related

Re: Contribution: better multi-field searching

2004-10-13 Thread Doug Cutting
Paul Elschot wrote: Did you see my IDF question at the bottom of the original note? I'm really curious why the square of IDF is used for Term and Phrase queries, rather than just IDF. It seems like it might be a bug? I missed that. It has been discussed recently, but I don't remember the

Re: Search and Scoring

2004-10-13 Thread Doug Cutting
Chuck Williams wrote: I think there are at least two bugs here: 1. idf should not be squared. I discussed this in a separate message. It's not a bug. 2. explain() should explain the actual reported score(). This is mostly a documentation bug in Hits. The normalization of scores to 1.0 is

Re: Contribution: better multi-field searching

2004-10-13 Thread Doug Cutting
Chuck Williams wrote: The issue is this. Imagine you have two fields, title and document, both of which you want to search with simple queries like: albino elephant. There are two general approaches, either a) create a combined field that concatenates the two individual fields, or b) expand the

Re: IndexInput GCJ

2004-10-13 Thread Doug Cutting
Andi Vajda wrote: This code is generated by JavaCC. I think the best way to fix this would be to fixup the code automatically whenever it is regenerated. So, instead of patching QueryParser.java, patch build.xml. In the javacc-QueryParser task, add a replace task which replaces 'jj_la1_0()'

Re: Contribution: better multi-field searching

2004-10-13 Thread Doug Cutting
Chuck Williams wrote: That approach does not work. I could not find an approach that would work with the built-in classes, although of course there might be one. The problem has two components: coord and the fact that BooleanQuery's sum their clause scores to compute the final score. The latter

Re: documentation in fileformats.html

2004-10-13 Thread Doug Cutting
Daniel Naber wrote: The web page is updated now, could you please re-check if it's correct? I added that information so that the Lucene = 1.4 format is still there. We should note that when compression is enabled, gzip is used. Also, byte[] is not a type defined in the file. In the formalism

Re: FuzzyQuery prefix length

2004-10-12 Thread Doug Cutting
Daniel Naber wrote: -It is the only change so far that we cannot express in the API, i.e. we cannot just deprecate a method to make Lucene's users aware of this. So we can only list it in CHANGES.txt, where some people will surely miss it. We could define a new query parser class with the new

Re: QueryParser and backwards-compatibility

2004-10-11 Thread Doug Cutting
Christoph Goller wrote: Since 1.4.2 is already out, we would have to make a version 1.4.3. OK, one more vote needed :-) I'm okay with a 1.4.3 release for this. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: PhrasePrefixQuery - MultiPhraseQuery

2004-10-11 Thread Doug Cutting
Daniel Naber wrote: I copied PhrasePrefixQuery to MultiPhraseQuery, decprecating PhrasePrefixQuery. The wiki also suggests to make MultipleTermPositions a private nested class. However, it is public currently so I wonder whether we can just remove/deprecate it without offering an alternative.

Re: FuzzyQuery prefix length

2004-10-11 Thread Doug Cutting
Daniel Naber wrote: I agree that the default should stay 0, even for Lucene 2.0. It should certainly stay zero for 1.4.x releases. However 2.0 is our opportunity to make incompatible changes. What is the best default for this, that will work well for the most applications? Does anyone have

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/queryParser QueryParser.java QueryParser.jj

2004-10-11 Thread Doug Cutting
[EMAIL PROTECTED] wrote: goller 2004/10/11 06:36:14 Modified:src/java/org/apache/lucene/queryParser Tag: lucene_1_4_2_dev QueryParser.java QueryParser.jj [ ... ] + * @deprecated use [EMAIL PROTECTED] #getFieldQuery(String, String)} Should these be deprecated

Re: sandbox - core ?

2004-10-08 Thread Doug Cutting
Erik Hatcher wrote: It would be nice if the Sandbox components were versioned and released along with the core - perhaps this would be a sufficient enough solution? But, alas, I have no free time currently to devote to this effort. That's precisely the reason to add these to the main CVS

Re: sandbox - core ?

2004-10-08 Thread Doug Cutting
Otis Gospodnetic wrote: I like this idea. I don't care so much about 1 or more CVS repositories, as much as separate Jars, so if we can make analyzers-1.4.2.jar and highlighter-1.4.2.jar along lucene-1.4.2.jar, that would be ideal, in my opinion. A minor point: we should prefix all the jar file

Re: IndexInput GCJ

2004-10-07 Thread Doug Cutting
Andi Vajda wrote: Do you intend to ultimately support Java Lucene with GCJ ? As far as possible... I'm down to 3 patches: Can you please file a Lucene bug report and attach these patches? I'm not guaranteeing that they'll all be committed right away, but rather that that's a better place to

Re: Lucene JAR for Maven Repo

2004-10-07 Thread Doug Cutting
I just copied the 1.4.2 jar there. Doug Otis Gospodnetic wrote: Here is the email I mentioned earlier on lucene-dev. --- Brian McCallister [EMAIL PROTECTED] wrote: To: [EMAIL PROTECTED] From: Brian McCallister [EMAIL PROTECTED] Subject: Maven Repo Date: Thu, 26 Aug 2004 19:59:50 -0400 Hi all,

Re: Lucene 1.4.2?

2004-10-02 Thread Doug Cutting
Daniel Naber wrote: On Friday 01 October 2004 23:57, Doug Cutting wrote: It is not mirrored yet. Erik's the only one who has ever done that. Erik, do you have time to mirror 1.4.2? Thanks. BTW, the release on the official download pages is still 1.4-final: http://jakarta.apache.org/site

Re: Lucene 1.4.2?

2004-10-01 Thread Doug Cutting
Christoph Goller wrote: I would never have guessed that calling the constructor there could make such a difference. The improvement is greatest for OR queries that contain a common term, i.e., which match a large portion of the collection. However for, e.g., most phrase searches and AND

Re: Lucene 1.4.2?

2004-10-01 Thread Doug Cutting
Christoph Goller wrote: Items 4 and 5 don't seem that important to me. As far as I am concerned we can leave them out. When did 4 happen? Was it a rare or common problem? I agree that we don't need to put 5 in 1.4.2. So the only thing missing is your optimization. Then 1.4.2 should be ready. I

Re: Using MMapDirectory fails TestCompoundFile; MMapDirectory for huge indexes

2004-10-01 Thread Doug Cutting
Paul Elschot wrote: I'm working on a memory mapped directory that uses multiple buffers for large files. Great! There will be a small performance hit, as each call to readByte() will need to first check whether it's overflowed the current buffer, right? While trying some test runs I found that

Re: Lucene 1.4.2?

2004-10-01 Thread Doug Cutting
The new release is up at http://jakarta.apache.org/lucene/. It is not mirrored yet. Erik's the only one who has ever done that. Erik, do you have time to mirror 1.4.2? Thanks. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: DbDirectory and compound files

2004-09-30 Thread Doug Cutting
Andi Vajda wrote: You ask if this makes sense. No, not really. I don't know the details of the purpose of the compound file implementation so this may be my problem. The purpose of the compound file implementation is to minimize the number of open files that an IndexReader must keep open.

Re: Lucene 1.4.2?

2004-09-30 Thread Doug Cutting
Christoph Goller wrote: I'd like the changes on FuzzyQuery, PhraseQuery, and PhrasePrefixQuery included in the branch. Any objections? I'm okay with these, but the primary purpose of 1.4.2 should be to stabilize things, not to add new features. So let's be very selective about what we add, and

Re: DbDirectory and compound files

2004-09-29 Thread Doug Cutting
Andi Vajda wrote: So, my question: why is the compound file storage implemented in such an orthogonal to Directory way instead of just being another Directory implementation called FSCompoundFileDirectory ? To combine the files of a segment we need to know when the segment was complete. So a

Re: Lucene 1.4.2?

2004-09-29 Thread Doug Cutting
Daniel Naber wrote: On Monday 20 September 2004 18:49, Doug Cutting wrote: To be clear, you are proposing that we branch from the 1.4.1 tag in CVS and re-apply the patches below? Yes, exactly. Now that we have a patch for the memory leak problem, should we start a 1.4.2 branch? Doug

Re: Lucene 1.4.2?

2004-09-29 Thread Doug Cutting
Daniel Naber wrote: I can try to do some of the work, but I'd need detailed instructions for branching and tagging. It's probably easier/better if you do those parts. I've never branched with CVS before either... so here goes! I've added a branch called lucene_1_4_2_dev. To get a copy, use: cvs

Re: IndexInput GCJ

2004-09-28 Thread Doug Cutting
Doug Cutting wrote: Still to do: 1. Replace OutputStream with IndexOutput and BufferedIndexOutput. This is not critical and mostly for consistency, as mmap makes more sense for read-only data. 2. Update RAMDirectory and FSDirectory to no longer use deprecated classes. This is done last

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store MMapDirectory.java

2004-09-28 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Added: src/java/org/apache/lucene/store MMapDirectory.java Log: Add an nio mmap based Directory implementation. For my simple benchmarks this is somewhat slower than the classic FSDirectory, but I thought it was still worth having. It should use less memory

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store MMapDirectory.java

2004-09-28 Thread Doug Cutting
Bruce Ritchie wrote: [EMAIL PROTECTED] wrote: One downside is that it cannot handle indexes with files larger than 2^31 bytes. Can you expand slightly on what causes this limitation and whether it still exists on 64 bit hardware? This is a limit of the nio ByteBuffer API, which uses int instead

Re: cvs commit: jakarta-lucene build.xml

2004-09-21 Thread Doug Cutting
Daniel Naber wrote: I'm using gcc/gcj 3.3.3, do I maybe need a more recent version? I'm currently using 3.4.1, but I think 3.4.0 will work as well. I had troubles with 3.3. I've worked more on this, and now have a version (not yet committed) which appears a bit faster than a JVM. More soon.

  1   2   3   4   5   >