A little bit correction: Impact does not have to be per occurrence of a term, but rather most likely per aggregation of all occurrences of a term in a document (per pair of term and doc). Thus you just aggregate the significance of occurrences in different regions of a doc at index time and put the aggregated significance into "impact". Then you can do away fields in a vector-space model of retreival.
But there is usually some semantics of fields in a boolean model and semi-structured information retrieval, which you can not get rid of. Michael --- Ming Lei <[EMAIL PROTECTED]> wrote: > Just my two cents, > I think what he meant by "single field" is the > following: > > If the concept of "field" was introduced to > differentiate the significance of term occurrences > in > difference regions of a document, (eg, the occurence > in title is more important than in body, etc), that > significance can be alternatively represented be (or > at least encoded in) "impact" which is per occurence > of a term. > For example, if a base "impact" value for a term > occurence is 1.0, you can assign an additional "0.5" > to the occurence in title, thus you would have the a > impace of "1.5" for the title occurrence of that > term, > while "1.0" for the body occurrence. > > Does this make sense to you? > > I feel that people should take a look at the > theoretical retrieval model first. It is not clear > to > me that Lucene is fully vector-space model. It seems > that so much assumption has been made about the > context of the discussion. > > Michael > > --- jian chen <[EMAIL PROTECTED]> wrote: > > > Hi, Jeff, > > > > I like the idea of impact based scoring. However, > > could you elaborate more > > on why we only need to use single field at search > > time? > > > > In Lucene, the indexed terms are field specific, > and > > two terms, even if they > > are the same, are still different terms if they > are > > of different fields. > > > > So, I think the multiple field scenario is still > > needed right? What if the > > user wants to search on both subject and content > for > > emails, for example, > > and sometimes, only wants to search on subject, > this > > type of tasks, without > > multiple fields, how this would be handled. > > > > I got lost on this, could any one educate? > > > > Thanks, > > > > Jian > > > > On 1/9/07, Dalton, Jeffery > <[EMAIL PROTECTED]> > > wrote: > > > > > > I'm not sure we fully understand one another, > but > > I'll try to explain > > > what I am thinking. > > > > > > Yes, it has use after sorting. It is used at > > query time for document > > > scoring in place of the TF and length norm > > components (new scorers > > > would need to be created). > > > > > > Using an impact based index moves most of the > > scoring from query time to > > > index time (trades query time flexibility for > > greatly improved query > > > search performance). Because the field boosts, > > length norm, position > > > boosts, etc... are incorporated into a single > > document-term-score, you > > > can use a single field at search time. It > allows > > one posting list per > > > query term instead of the current one posting > list > > per field per query > > > term (MultiFieldQueryParser wouldn't be > necessary > > in most cases). In > > > addition to having fewer posting lists to > examine, > > you often don't need > > > to read to the end of long posting lists when > > processing with a > > > score-at-a-time approach (see Anh/Moffat's > Pruned > > Query Evaluation Using > > > Pre-Computed Impacts, SIGIR 2006) for details on > > one potential > > > algorithm. > > > > > > I'm not quite sure what you mean when mention > > leaving them out and > > > re-calculating them at merge time. > > > > > > - Jeff > > > > > > > -----Original Message----- > > > > From: Marvin Humphrey > > [mailto:[EMAIL PROTECTED] > > > > Sent: Tuesday, January 09, 2007 2:58 PM > > > > To: java-dev@lucene.apache.org > > > > Subject: Re: Beyond Lucene 2.0 Index Design > > > > > > > > > > > > On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery > > wrote: > > > > > > > > > e. <impact, num_docs, (doc1,...docN)> > > > > > f. <impact, num_docs, ([doc1, freq > > ,<positions>],...[docN, freq > > > > > ,<positions>]) > > > > > > > > Does the impact have any use after it's used > to > > sort the postings? > > > > Can we leave it out of the index format and > > recalculate at merge-time? > > > > > > > > Marvin Humphrey > > > > Rectangular Research > > > > http://www.rectangular.com/ > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > > [EMAIL PROTECTED] > > > For additional commands, e-mail: > > [EMAIL PROTECTED] > > > > > > > > > > > > > ____________________________________________________________________________________ > Need a quick answer? Get one in minutes from people > who know. > Ask your question on www.Answers.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > ____________________________________________________________________________________ Want to start your own business? Learn how on Yahoo! Small Business. http://smallbusiness.yahoo.com/r-index --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]