I guess ElasticSearch went ahead of SOLR with the percolate API, which is exactly what is needed for two-way constraint+doc matching problem present in Advertising systems and other use cases:
http://www.elasticsearch.org/guide/reference/api/percolate.html Cheers, Joaquin Delgado, PhD. http://www.linkedin.com/pub/profile/0/04b/277 On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <[email protected]>wrote: > Efficient rule matching goes further back, at least to "alerting" in > Verity K2. > > wunder > Search Guy, Chegg > > On Mar 26, 2012, at 10:15 AM, J. Delgado wrote: > > BTW, the idea of indexing Boolean Expressions inside a text indexing > engine is not new. For example Oracle Text provides the CTXRULE index and > the MATCHES operator within their indexing stack, which is primarily used > for Rule-based text classification. > > See: > > http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8 > > http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013 > > -- J > > On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <[email protected]>wrote: > >> In full dislosure, there is a patent application that Yahoo! has filed >> for the use of inverted indexes for using complex predicates for matching >> contracts and opportunities in advertising: >> >> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false >> >> However I believe there are many more applications that can benefit from >> similar matching techniques (i.e. recommender systems, >> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas >> exposed in the original VLDB'09 paper (which is public) in Lucene. >> >> As a Yahoo! employee, I might not be able to directly contribute to this >> project but will be happy to point to any publicly available pointer that >> can help. >> >> Cheers, >> >> -- Joaquin >> >> >> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev < >> [email protected]> wrote: >> >>> Hello Joaquin, >>> >>> I looked through the paper several times, and see no problem to >>> implement it in Lucene (the trivial case at least): >>> >>> Let's index conjunctive condition as >>> {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} >>> >>> then, form query from the incoming fact (event): >>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD >>> >>> to enforce overlap between condition and event, wrap the query above >>> into own query whose scorer will check that numClauses for the matched doc >>> is equal to number of matched clauses. >>> To get "numClauses for the matched doc" you can use FieldCache that's >>> damn fast; and "number of matched clauses" can be obtained from >>> DisjunctionSumScorer.nrMatchers() >>> >>> Negative clauses, and multivalue can be covered also, I believe. >>> >>> WDYT? >>> >>> >>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado >>> <[email protected]>wrote: >>> >>>> I looked at LUCENE-2987 and its work on the query side (changes to the >>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really >>>> related to my proposal. >>>> >>>> What I'm proposing is to be able to index complex boolean expressions >>>> using Lucene. This can be viewed as the opposite of the regular search >>>> task. The objective here is find a set of relevant queries given a document >>>> (assignment of values to fields). >>>> >>>> This by itself may not sound that interesting but its a key piece >>>> to efficiently implementing any MATCHING system which is effectively a >>>> two-way search where constraints are defined both-ways. An example of this >>>> would be: >>>> >>>> 1) Job matching: Potential employers define their "job posting" as a >>>> documents along with complex boolean expressions used to narrow potential >>>> candidates. Job searchers upload their "profile" and may formulate complex >>>> queries when executing a search. Once a is search initiated from any of the >>>> sides constraints need to satisfied both ways. >>>> 2) Advertising: Publishers define constraints on the type of >>>> advertisers/ads they are willing to show in their sites. On the other hand, >>>> advertisers define constraints (typically at the campaign level) on >>>> publisher sites they want their ads to show at as well as on the user >>>> audiences they are targeting to. While some attribute values are known at >>>> definition time, others are only instantiated once the user visits a given >>>> page which triggers a matching request that must be satisfied in >>>> few milliseconds to select "valid" ads and then scored based on >>>> "relevance". >>>> >>>> So in a matching system a MATCH QUERY is considered to to be a tuple >>>> that consists of a value assignment to attributes/fields (doc) + a boolean >>>> expression (query) that goes against a double index also built on tuples >>>> that simultaneously boolean expressions and associated documents. >>>> >>>> To do this efficiently we need to be able to build indexes on Boolean >>>> expressions (Lucene queries) and retrieve the set of matching expressions >>>> given a doc (typically few attributes with values assigned), which is the >>>> core of what is described in this paper: "Indexing Boolean Expressions" >>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) >>>> >>>> -- J >>>> >>>> >>>> So to effectively resolve the problem of realtime matching one can >>>> >>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <[email protected]>wrote: >>>> >>>>> On 02/21/2012 12:15 PM, Aayush Kothari wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> So if Aayush Kothari is interested in working on this as a Student, >>>>>> all we need is a formal mentor (I can be the informal one). >>>>>> >>>>>> Anyone up for the task? >>>>>> >>>>>> >>>>>> Completely interested in working for and learning about the >>>>> aforementioned subject/project. +1. >>>>> >>>>> This may be related to the work I'm doing with LUCENE-2987 >>>>> Basically changing the grammar to accepts conjunctions AND and OR in >>>>> the query text. >>>>> I would be interested in working with you on some of the details. >>>>> >>>>> However, I too am not a formal committer. >>>>> >>>>> -- >>>>> Joe Cabreraeminorlabs.com >>>>> >>>>> >>>> >>> >>> >>> -- >>> Sincerely yours >>> Mikhail Khludnev >>> Lucid Certified >>> Apache Lucene/Solr Developer >>> Grid Dynamics >>> >>> > > >
