I have small documents indexed. When I query the index using a BooleanQuery containing {why,is,the,sky,blue} with all queries having the MUST BooleanClause, I do not retrieve any results. However, when I use only { why,sky,blue} I get results which are Why is the sky blue? And several of them.
What is going wrong? Please help. -----Original Message----- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent: Monday, November 06, 2006 5:18 AM To: general@lucene.apache.org Subject: Re: [PROPOSAL] index server project Hi, do people think we are already in a stage where we can setup some basic infrastructure like mailing list and wiki and move the discussion to the new mailing list. Maybe setup a incubator project? I would be happy to help with such basic tasks. Stefan Am 31.10.2006 um 22:03 schrieb Yonik Seeley: > On 10/30/06, Doug Cutting <[EMAIL PROTECTED]> wrote: >> Yonik Seeley wrote: >> > On 10/18/06, Doug Cutting <[EMAIL PROTECTED]> wrote: >> >> We assume that, within an index, a file with a given name is >> written >> >> only once. >> > >> > Is this necessary, and will we need the lockless patch (that avoids >> > renaming or rewriting *any* files), or is Lucene's current index >> > behavior sufficient? >> >> It's not strictly required, but it would make index synchronization a >> lot simpler. Yes, I was assuming the lockless patch would be >> committed >> to Lucene before this project gets very far. Something more than >> that >> would be required in order to keep old versions, but this could be as >> simple as a Directory subclass that refuses to remove files for a >> time. > > Or a snapshot (hard links) mechanism. > Lucene would also need a way to open a specific index version (rather > than just the latest), but I guess that could also be hacked into > Directory by hiding later "segments" files (assumes lockless is > committed). > >> > It's unfortunate the master needs to be involved on every >> document add. >> >> That should not normally be the case. > > Ahh... I had assumed that "id" in the following method was document > id: > IndexLocation getUpdateableIndex(String id); > > I see now it's index id. > > But what is index id exactly? Looking at the example API you laid > down, it must be a single physical index (as opposed to a logical > index). In which case, is it entirely up to the client to manage > multi-shard indicies? For example, if we had a "photo" index broken > up into 3 shards, each shard would have a separate index id and it > would be up to the client to know this, and to query across the > different "photo0", "photo1", "photo2" indicies. The master would > have no clue those indicies were related. Hmmm, that doesn't work > very well for deletes though. > > It seems like there should be the concept of a logical index, that is > composed of multiple shards, and each shard has multiple copies. > > Or were you thinking that a cluster would only contain a single > logical index, and hence all different index ids are simply different > shards of that single logical index? That would seem to be consistent > with ClientToMasterProtocol .getSearchableIndexes() lacking an id > argument. > >> I was not imagining a real-time system, where the next query after a >> document is added would always include that document. Is that a >> requirement? That's harder. > > Not real-time, but it would be nice if we kept it close to what Lucene > can currently provide. > Most people seem fine with a latency of minutes. > >> At this point I'm mostly trying to see if this functionality would >> meet >> the needs of Solr, Nutch and others. >> > > It depends on the project scope and how extensible things are. > It seems like the master would be a WAR, capable of running stand- > alone. > What about index servers (slaves)? Would this project include just > the interfaces to be implemented by Solr/Nutch nodes, some common > implementation code behind the interfaces in the form of a library, or > also complete standalone WARs? > > I'd need to be able to extend the ClientToSlave protocol to add > additional methods for Solr (for passing in extra parameters and > returning various extra data such as facets, highlighting, etc). > >> Must we include a notion of document identity and/or document >> version in >> the mechanism? Would that facillitate updates and coherency? > > It doesn't need to be in the interfaces I don't think, so it depends > on the scope of the index server implementations. > > -Yonik > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 101tec Inc. search tech for web 2.1 Menlo Park, California http://www.101tec.com