BTW, the idea of indexing Boolean Expressions inside a text indexing engine is not new. For example Oracle Text provides the CTXRULE index and the MATCHES operator within their indexing stack, which is primarily used for Rule-based text classification.
See: http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8 http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013 -- J On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <joaquin.delg...@gmail.com>wrote: > In full dislosure, there is a patent application that Yahoo! has filed for > the use of inverted indexes for using complex predicates for matching > contracts and opportunities in advertising: > > http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false > > However I believe there are many more applications that can benefit from > similar matching techniques (i.e. recommender systems, > e-commerce, recruiting,etc) to make it worthwhile implementing the ideas > exposed in the original VLDB'09 paper (which is public) in Lucene. > > As a Yahoo! employee, I might not be able to directly contribute to this > project but will be happy to point to any publicly available pointer that > can help. > > Cheers, > > -- Joaquin > > > On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > >> Hello Joaquin, >> >> I looked through the paper several times, and see no problem to implement >> it in Lucene (the trivial case at least): >> >> Let's index conjunctive condition as >> {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} >> >> then, form query from the incoming fact (event): >> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD >> >> to enforce overlap between condition and event, wrap the query above into >> own query whose scorer will check that numClauses for the matched doc is >> equal to number of matched clauses. >> To get "numClauses for the matched doc" you can use FieldCache that's >> damn fast; and "number of matched clauses" can be obtained from >> DisjunctionSumScorer.nrMatchers() >> >> Negative clauses, and multivalue can be covered also, I believe. >> >> WDYT? >> >> >> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delg...@gmail.com>wrote: >> >>> I looked at LUCENE-2987 and its work on the query side (changes to the >>> accepted syntax to accept lower case 'or' and 'and'), which isn't really >>> related to my proposal. >>> >>> What I'm proposing is to be able to index complex boolean expressions >>> using Lucene. This can be viewed as the opposite of the regular search >>> task. The objective here is find a set of relevant queries given a document >>> (assignment of values to fields). >>> >>> This by itself may not sound that interesting but its a key piece >>> to efficiently implementing any MATCHING system which is effectively a >>> two-way search where constraints are defined both-ways. An example of this >>> would be: >>> >>> 1) Job matching: Potential employers define their "job posting" as a >>> documents along with complex boolean expressions used to narrow potential >>> candidates. Job searchers upload their "profile" and may formulate complex >>> queries when executing a search. Once a is search initiated from any of the >>> sides constraints need to satisfied both ways. >>> 2) Advertising: Publishers define constraints on the type of >>> advertisers/ads they are willing to show in their sites. On the other hand, >>> advertisers define constraints (typically at the campaign level) on >>> publisher sites they want their ads to show at as well as on the user >>> audiences they are targeting to. While some attribute values are known at >>> definition time, others are only instantiated once the user visits a given >>> page which triggers a matching request that must be satisfied in >>> few milliseconds to select "valid" ads and then scored based on "relevance". >>> >>> So in a matching system a MATCH QUERY is considered to to be a tuple >>> that consists of a value assignment to attributes/fields (doc) + a boolean >>> expression (query) that goes against a double index also built on tuples >>> that simultaneously boolean expressions and associated documents. >>> >>> To do this efficiently we need to be able to build indexes on Boolean >>> expressions (Lucene queries) and retrieve the set of matching expressions >>> given a doc (typically few attributes with values assigned), which is the >>> core of what is described in this paper: "Indexing Boolean Expressions" >>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) >>> >>> -- J >>> >>> >>> So to effectively resolve the problem of realtime matching one can >>> >>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <calcmaste...@gmail.com>wrote: >>> >>>> On 02/21/2012 12:15 PM, Aayush Kothari wrote: >>>> >>>> >>>> >>>> >>>>> So if Aayush Kothari is interested in working on this as a Student, >>>>> all we need is a formal mentor (I can be the informal one). >>>>> >>>>> Anyone up for the task? >>>>> >>>>> >>>>> Completely interested in working for and learning about the >>>> aforementioned subject/project. +1. >>>> >>>> This may be related to the work I'm doing with LUCENE-2987 >>>> Basically changing the grammar to accepts conjunctions AND and OR in >>>> the query text. >>>> I would be interested in working with you on some of the details. >>>> >>>> However, I too am not a formal committer. >>>> >>>> -- >>>> Joe Cabreraeminorlabs.com >>>> >>>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Lucid Certified >> Apache Lucene/Solr Developer >> Grid Dynamics >> >> <http://www.griddynamics.com> >> <mkhlud...@griddynamics.com> >> >> >