Re: merged search of document

Thomas Scheffler Wed, 07 Jan 2004 12:01:00 -0800

Am Mit, den 07.01.2004 schrieb Dror Matalon um 19:00:
> The solution is simple, but you need to think of it conceptually in a
> different way. Instead of "all documents with the same DocID are the same
> document" think "fetch all the document where DocId is XYZ."
> 
> Assuming the contents are in a field called contents
> you do 
> +(DocID:XYZ) (contents:foo) (contents:bar)


I allready was on that way but think of a search like (foo -bar). With
your solution it will result in a hit because on page 345 (to keep my
example) is the word "foo" and no "bar". Of cause I want with my model,
that the book don't get a hit for that query. You see how hard it is to
handle, isn't it? 

> 
> For that matter, you can use a standard analyzer on the query and use a
> boolean to tie it to the specific document set.
> 
> This is how we do searching on a specific channel at fastbuzz.com.
> 
> Dror
> 
> 
> On Wed, Jan 07, 2004 at 05:21:43PM +0100, Thomas Scheffler wrote:
> > 
> > Jamie Stallwood sagte:
> > > +(DocID:XYZ DocID:ABC) +(foo bar)
> > >
> > > will find a document that (MUST have (xyz OR abc)) AND (MUST have (foo OR
> > > bar)).
> > 
> > This is just the solution for the example in real world I really don't
> > have noc documents containing "foo" or "bar". What I meant was: Make
> > Lucene think, that all Documents with the same DocID are ONE Document.
> > Imagine you have a big book, say 1000 pages. Instead of putting the whole
> > book in the index, you split it up in single pages and index them. Now
> > it's faster if a page changes or is deleted to update your index instead
> > of doing it over and over again for all 1000 pages. So you problem starts
> > when you're searching on the book. You search for (foo bar), foo is on
> > site 345 while bar ist on 435. You want to get a hit for the book. So I
> > need a solution matching this more generic example.
> > 
> > >
> > > -----Original Message-----
> > > From: Thomas Scheffler [mailto:[EMAIL PROTECTED]
> > > Sent: 07 January 2004 11:23
> > > To: [EMAIL PROTECTED]
> > > Subject: merged search of document
> > >
> > > Hi,
> > >
> > > I need a tip for implementation. I have several documents all of them with
> > > a field named DocID. DocID identifies not a single Lucene Document but a
> > > collection of them. When I wan't to start a seach it should handle the
> > > search in that way, as these lucene documents where one.
> > >
> > > example:
> > >
> > > Document 1: DocID:XYZ
> > >
> > > containing: foo
> > >
> > > Document 2: DocID:XYZ
> > >
> > > containing: bar
> > >
> > > Document 3: DocID:ABC
> > >
> > > containing: foo bar
> > >
> > > Document 4: GHJ
> > >
> > > containing: foo
> > >
> > > As you already guesses, when I'm searching for "+foo +bar" I wan't the
> > > hits to contain Document 1, Document 2 and Document 3, not Document 4. Is
> > > that clear what I want? How do I implement such a monster? Is that
> > > possible with lucene? The content is not stored within lucene it's just
> > > tokenized and indexed.
> > >
> > > Any help?
> > >
> > > Thanks in advance!
> > >
> > > Thomas Scheffler
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > 
> > 
> > -- 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
--
Fachbegriffe der Informatik - Einfach erklÃrt
=============================================
NÂ 37 -- Fehlertolerant :

Das Programm erlaubt keine Benutzereingaben.

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

Re: merged search of document

Reply via email to