Yeah.... except with Percolator (which uses MemoryIndex under the hood, I believe), there is no relevancy score, so you just get match/no match, and if you need to figure out top N best matching ads against queries derived from page content, you need that relevancy score to get not all matching docs, but just those top N.
No? Otis -- http://sematext.com/ On Mon, Feb 11, 2013 at 11:22 AM, J. Delgado <[email protected]>wrote: > I guess ElasticSearch went ahead of SOLR with the percolate API, which is > exactly what is needed for two-way constraint+doc matching problem present > in Advertising systems and other use cases: > > http://www.elasticsearch.org/guide/reference/api/percolate.html > > Cheers, > > Joaquin Delgado, PhD. > http://www.linkedin.com/pub/profile/0/04b/277 > > On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood > <[email protected]>wrote: > >> Efficient rule matching goes further back, at least to "alerting" in >> Verity K2. >> >> wunder >> Search Guy, Chegg >> >> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote: >> >> BTW, the idea of indexing Boolean Expressions inside a text indexing >> engine is not new. For example Oracle Text provides the CTXRULE index and >> the MATCHES operator within their indexing stack, which is primarily used >> for Rule-based text classification. >> >> See: >> >> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8 >> >> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013 >> >> -- J >> >> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado >> <[email protected]>wrote: >> >>> In full dislosure, there is a patent application that Yahoo! has filed >>> for the use of inverted indexes for using complex predicates for matching >>> contracts and opportunities in advertising: >>> >>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false >>> >>> However I believe there are many more applications that can benefit from >>> similar matching techniques (i.e. recommender systems, >>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas >>> exposed in the original VLDB'09 paper (which is public) in Lucene. >>> >>> As a Yahoo! employee, I might not be able to directly contribute to this >>> project but will be happy to point to any publicly available pointer that >>> can help. >>> >>> Cheers, >>> >>> -- Joaquin >>> >>> >>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev < >>> [email protected]> wrote: >>> >>>> Hello Joaquin, >>>> >>>> I looked through the paper several times, and see no problem to >>>> implement it in Lucene (the trivial case at least): >>>> >>>> Let's index conjunctive condition as >>>> {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} >>>> >>>> then, form query from the incoming fact (event): >>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD >>>> >>>> to enforce overlap between condition and event, wrap the query above >>>> into own query whose scorer will check that numClauses for the matched doc >>>> is equal to number of matched clauses. >>>> To get "numClauses for the matched doc" you can use FieldCache that's >>>> damn fast; and "number of matched clauses" can be obtained from >>>> DisjunctionSumScorer.nrMatchers() >>>> >>>> Negative clauses, and multivalue can be covered also, I believe. >>>> >>>> WDYT? >>>> >>>> >>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado >>>> <[email protected]>wrote: >>>> >>>>> I looked at LUCENE-2987 and its work on the query side (changes to the >>>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really >>>>> related to my proposal. >>>>> >>>>> What I'm proposing is to be able to index complex boolean expressions >>>>> using Lucene. This can be viewed as the opposite of the regular search >>>>> task. The objective here is find a set of relevant queries given a >>>>> document >>>>> (assignment of values to fields). >>>>> >>>>> This by itself may not sound that interesting but its a key piece >>>>> to efficiently implementing any MATCHING system which is effectively a >>>>> two-way search where constraints are defined both-ways. An example of this >>>>> would be: >>>>> >>>>> 1) Job matching: Potential employers define their "job posting" as a >>>>> documents along with complex boolean expressions used to narrow potential >>>>> candidates. Job searchers upload their "profile" and may formulate complex >>>>> queries when executing a search. Once a is search initiated from any of >>>>> the >>>>> sides constraints need to satisfied both ways. >>>>> 2) Advertising: Publishers define constraints on the type of >>>>> advertisers/ads they are willing to show in their sites. On the other >>>>> hand, >>>>> advertisers define constraints (typically at the campaign level) on >>>>> publisher sites they want their ads to show at as well as on the user >>>>> audiences they are targeting to. While some attribute values are known at >>>>> definition time, others are only instantiated once the user visits a given >>>>> page which triggers a matching request that must be satisfied in >>>>> few milliseconds to select "valid" ads and then scored based on >>>>> "relevance". >>>>> >>>>> So in a matching system a MATCH QUERY is considered to to be a tuple >>>>> that consists of a value assignment to attributes/fields (doc) + a boolean >>>>> expression (query) that goes against a double index also built on tuples >>>>> that simultaneously boolean expressions and associated documents. >>>>> >>>>> To do this efficiently we need to be able to build indexes on Boolean >>>>> expressions (Lucene queries) and retrieve the set of matching expressions >>>>> given a doc (typically few attributes with values assigned), which is the >>>>> core of what is described in this paper: "Indexing Boolean Expressions" >>>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) >>>>> >>>>> -- J >>>>> >>>>> >>>>> So to effectively resolve the problem of realtime matching one can >>>>> >>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera >>>>> <[email protected]>wrote: >>>>> >>>>>> On 02/21/2012 12:15 PM, Aayush Kothari wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> So if Aayush Kothari is interested in working on this as a >>>>>>> Student, all we need is a formal mentor (I can be the informal one). >>>>>>> >>>>>>> Anyone up for the task? >>>>>>> >>>>>>> >>>>>>> Completely interested in working for and learning about the >>>>>> aforementioned subject/project. +1. >>>>>> >>>>>> This may be related to the work I'm doing with LUCENE-2987 >>>>>> Basically changing the grammar to accepts conjunctions AND and OR in >>>>>> the query text. >>>>>> I would be interested in working with you on some of the details. >>>>>> >>>>>> However, I too am not a formal committer. >>>>>> >>>>>> -- >>>>>> Joe Cabreraeminorlabs.com >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> Lucid Certified >>>> Apache Lucene/Solr Developer >>>> Grid Dynamics >>>> >>>> >> >> >> >
