The solution is simple, but you need to think of it conceptually in a different way. Instead of "all documents with the same DocID are the same document" think "fetch all the document where DocId is XYZ."
Assuming the contents are in a field called contents you do +(DocID:XYZ) (contents:foo) (contents:bar) For that matter, you can use a standard analyzer on the query and use a boolean to tie it to the specific document set. This is how we do searching on a specific channel at fastbuzz.com. Dror On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote: > > Jamie Stallwood sagte: > > +(DocID:XYZ DocID:ABC) +(foo bar) > > > > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo OR > > bar)). > > This is just the solution for the example in real world I really don't > have noc documents containing "foo" or "bar". What I meant was: Make > Lucene think, that all Documents with the same DocID are ONE Document. > Imagine you have a big book, say 1000 pages. Instead of putting the whole > book in the index, you split it up in single pages and index them. Now > it's faster if a page changes or is deleted to update your index instead > of doing it over and over again for all 1000 pages. So you problem starts > when you're searching on the book. You search for (foo bar), foo is on > site 345 while bar ist on 435. You want to get a hit for the book. So I > need a solution matching this more generic example. > > > > > -----Original Message----- > > From: Thomas Scheffler [mailto:[EMAIL PROTECTED] > > Sent: 07 January 2004 11:23 > > To: [EMAIL PROTECTED] > > Subject: merged search of document > > > > Hi, > > > > I need a tip for implementation. I have several documents all of them with > > a field named DocID. DocID identifies not a single Lucene Document but a > > collection of them. When I wan't to start a seach it should handle the > > search in that way, as these lucene documents where one. > > > > example: > > > > Document 1: DocID:XYZ > > > > containing: foo > > > > Document 2: DocID:XYZ > > > > containing: bar > > > > Document 3: DocID:ABC > > > > containing: foo bar > > > > Document 4: GHJ > > > > containing: foo > > > > As you already guesses, when I'm searching for "+foo +bar" I wan't the > > hits to contain Document 1, Document 2 and Document 3, not Document 4. Is > > that clear what I want? How do I implement such a monster? Is that > > possible with lucene? The content is not stored within lucene it's just > > tokenized and indexed. > > > > Any help? > > > > Thanks in advance! > > > > Thomas Scheffler > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
