Am Mit, den 07.01.2004 schrieb Dror Matalon um 19:00: > The solution is simple, but you need to think of it conceptually in a > different way. Instead of "all documents with the same DocID are the same > document" think "fetch all the document where DocId is XYZ." > > Assuming the contents are in a field called contents > you do > +(DocID:XYZ) (contents:foo) (contents:bar)
I allready was on that way but think of a search like (foo -bar). With your solution it will result in a hit because on page 345 (to keep my example) is the word "foo" and no "bar". Of cause I want with my model, that the book don't get a hit for that query. You see how hard it is to handle, isn't it? > > For that matter, you can use a standard analyzer on the query and use a > boolean to tie it to the specific document set. > > This is how we do searching on a specific channel at fastbuzz.com. > > Dror > > > On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote: > > > > Jamie Stallwood sagte: > > > +(DocID:XYZ DocID:ABC) +(foo bar) > > > > > > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo OR > > > bar)). > > > > This is just the solution for the example in real world I really don't > > have noc documents containing "foo" or "bar". What I meant was: Make > > Lucene think, that all Documents with the same DocID are ONE Document. > > Imagine you have a big book, say 1000 pages. Instead of putting the whole > > book in the index, you split it up in single pages and index them. Now > > it's faster if a page changes or is deleted to update your index instead > > of doing it over and over again for all 1000 pages. So you problem starts > > when you're searching on the book. You search for (foo bar), foo is on > > site 345 while bar ist on 435. You want to get a hit for the book. So I > > need a solution matching this more generic example. > > > > > > > > -----Original Message----- > > > From: Thomas Scheffler [mailto:[EMAIL PROTECTED] > > > Sent: 07 January 2004 11:23 > > > To: [EMAIL PROTECTED] > > > Subject: merged search of document > > > > > > Hi, > > > > > > I need a tip for implementation. I have several documents all of them with > > > a field named DocID. DocID identifies not a single Lucene Document but a > > > collection of them. When I wan't to start a seach it should handle the > > > search in that way, as these lucene documents where one. > > > > > > example: > > > > > > Document 1: DocID:XYZ > > > > > > containing: foo > > > > > > Document 2: DocID:XYZ > > > > > > containing: bar > > > > > > Document 3: DocID:ABC > > > > > > containing: foo bar > > > > > > Document 4: GHJ > > > > > > containing: foo > > > > > > As you already guesses, when I'm searching for "+foo +bar" I wan't the > > > hits to contain Document 1, Document 2 and Document 3, not Document 4. Is > > > that clear what I want? How do I implement such a monster? Is that > > > possible with lucene? The content is not stored within lucene it's just > > > tokenized and indexed. > > > > > > Any help? > > > > > > Thanks in advance! > > > > > > Thomas Scheffler > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > -- > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Fachbegriffe der Informatik - Einfach erklÃrt ============================================= NÂ 37 -- Fehlertolerant : Das Programm erlaubt keine Benutzereingaben.
signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
