Tom Mortimer updated SOLR-7341:
    Attachment: SOLR-7341.patch-7.2.1

> xjoin - join data from external sources
> ---------------------------------------
>                 Key: SOLR-7341
>                 URL: https://issues.apache.org/jira/browse/SOLR-7341
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Tom Winch
>            Priority: Minor
>             Fix For: 4.10.3, 5.3.2, 6.0
>         Attachments: SOLR-7341.patch-4.10.3, SOLR-7341.patch-4_10, 
> SOLR-7341.patch-5.3.2, SOLR-7341.patch-5_3, SOLR-7341.patch-7.2.1, 
> SOLR-7341.patch-master, SOLR-7341.patch-trunk, SOLR-7341.patch-trunk
> h2. XJoin
> The "xjoin" SOLR contrib allows external results to be joined with SOLR 
> results in a query and the SOLR result set to be filtered by the results of 
> an external query. Values from the external results are made available in the 
> SOLR results and may also be used to boost the scores of corresponding 
> documents during the search. The contrib consists of the Java classes 
> XJoinSearchComponent, XJoinValueSourceParser and XJoinQParserPlugin (and 
> associated classes), which must be configured in solrconfig.xml, and the 
> interfaces XJoinResultsFactory and XJoinResults, which are implemented by the 
> user to provide the link between SOLR and the external results source (but 
> see below for details of how to use the in-built SimpleXJoinResultsFactory 
> implementation). External results and SOLR documents are matched via a single 
> configurable attribute (the "join field").
> To include the XJoin contrib classes, add the following config to 
> solrconfig.xml:
> {code:xml}
> <config>
>   ..
>    <!-- XJoin contrib -->
>   <lib dir="${solr.install.dir:../../../..}/contrib/xjoin/lib" 
> regex=".*\.jar" />
>   <lib dir="${solr.install.dir:../../../..}/dist/" 
> regex="solr-xjoin-\d.*\.jar" />
>   ..
> </config>
> {code}
> Note that any JARs containing implementations of the XJoinResultsFactory must 
> also be included.
> h2. Java classes and interfaces
> h3. XJoinResultsFactory
> The user implementation of this interface is responsible for connecting to an 
> external source to perform a query (or otherwise collect results). Parameters 
> with prefix "<component name>.external." are passed from the SOLR query URL 
> to pararameterise the search. The interface has the following methods:
> * void init(NamedList args) - this is called during SOLR initialisation, and 
> passed parameters from the search component configuration (see below)
> * XJoinResults getResults(SolrParams params) - this is called during a SOLR 
> search to generate external results, and is passed parameters from the SOLR 
> query URL (as above)
> For example, the implementation might perform queries of an external source 
> based on the 'q' SOLR query URL parameter (in full, <component 
> name>.external.q).
> h3. XJoinResults
> A user implementation of this interface is returned by the getResults() 
> method of the XJoinResultsFactory implementation. It has methods:
> * Object getResult(String joinId) - this should return a particular result 
> given the value of the join attribute
> * Iterable<String> getJoinIds() - this should return an ordered (ascending) 
> list of the join attribute values for all results of the external search
> h3. XJoinSearchComponent
> This is the central Java class of the contrib. It is a SOLR search component, 
> configured in solrconfig.xml and included in one or more SOLR request 
> handlers. There is one XJoin search component per external source, and each 
> has two main responsibilities:
> * Before the SOLR search, it connects to the external source and retrieves 
> results, storing them in the SOLR request context
> * After the SOLR search, it matches SOLR document in the results set and 
> external results via the join field, adding attributes from the external 
> results to documents in the SOLR results set
> It takes the following initialisation parameters:
> * factoryClass - this specifies the user-supplied class implementing 
> XJoinResultsFactory, used to generate external results
> * joinField - this specifies the attribute on which to join between SOLR 
> documents and external results
> * external - this parameter set is passed to configure the 
> XJoinResultsFactory implementation
> For example, in solrconfig.xml:
> {code:xml}
> <searchComponent name="xjoin_test" 
> class="org.apache.solr.search.xjoin.XJoinSearchComponent">
>   <str name="factoryClass">test.TestXJoinResultsFactory</str>
>   <str name="joinField">id</str>
>   <lst name="external">
>     <str name="values">1,2,3</str>
>   </lst>
> </searchComponent>
> {code}
> Here, the search component instantiates a new TextXJoinResultsFactory during 
> initialisation, and passes it the "values" parameter (1, 2, 3) to configure 
> it. To properly use the XJoinSearchComponent in a request handler, it must be 
> included at the start and end of the component list, and may be configured 
> with the following query parameters:
> * results - a comma-separated list of attributes from the XJoinResults 
> implementation (created by the factory at search time) to be included in the 
> SOLR results
> * fl - a comma-separated list of attributes from results objects (contained 
> in an XJoinResults implementation) to be included in the SOLR results
> For example:
> {code:xml}
> <requestHandler name="/xjoin" class="solr.SearchHandler" startup="lazy">
>   <lst name="defaults">
>     ..
>     <bool name="xjoin_test">true</bool>
>     <str name="xjoin_test.listParameter">xx</str>
>     <str name="xjoin_test.results">test_count</str>
>     <str name="xjoin_test.fl">id,value</str>
>   </lst>
>   <arr name="first-components">
>     <str>xjoin_test</str>
>   </arr>
>   <arr name="last-components">
>     <str>xjoin_test</str>
>   </arr>
> </requestHandler>
> {code}
> Note that, to include the list of join ids returned by the external source in 
> the SOLR results (likely for debug purposes), the value 'join_ids' may be 
> specified in the "results" parameter.
> h3. XJoinQParserPlugin
> This query parser plugin constructs a query from the results of the external 
> searches, and is based on the TermsQParserPlugin. It takes the following 
> local parameters:
> * method - as the TermsQParserPlugin, this specifies how to build the 
> Lucene query based on the join ids contained in external results; one of 
> termsFilter, booleanQuery, automaton, or docValuesTermsFilter (defaults 
> to termsFilter)
> * v (or as usual with query parsers, specified via the query) - a Boolean 
> combination of XJoin search component names. Supported operators are OR, AND, 
> XOR, and AND NOT
> The query is a Boolean expression whose terms are XJoin search component 
> names. The resulting set of join ids (obtained from the respective XJoin 
> search components) are formed into a Lucene query. Note that the join field 
> of all the referenced XJoin search components must be identical. Of course, 
> the expression can be a single XJoin search component name in the simplest 
> situation. For example:
> {code}
> q={!xjoin}xjoin_test
> q={!xjoin v=xjoin_test}
> fq={!xjoin method=automaton}xjoin_test1 AND NOT xjoin_test2
> {code}
> h3. XJoinValueSourceParser
> This class provides a SOLR function that may be used, for example, in a boost 
> function to weight the result score from external values. The function 
> returns an attribute value from the external result with matching join 
> attribute. There are two ways of using the function. Either the XJoin 
> component name is specified in the configuration parameters and the external 
> result attribute is the argument of the function in the query, or vice versa, 
> the attribute is specified in the configuration parameters and the component 
> name is the function argument.
> The parameters for configuration in solrconfig.xml are:
> * xJoinSearchComponent - the name of an XJoin search component containing 
> external results
> * attribute - the attribute to use from external results
> * defaultValue - if the external result has no such attribute, then this 
> value is returned
> Normally, only one of xJoinSearchComponent and attribute is configured, but 
> it is possible to specify both (but you must specify at least one).
> For example:
> {code:xml}
> <valueSourceParser name="test_fn" 
> class="org.apache.solr.search.xjoin.XJoinValueSourceParser">
>   <str name="xJoinSearchComponent">xjoin_test</str>
>   <double name="defaultValue">1.0</double>
> </valueSourceParser>
> {code}
> with corresponding query string parameter (for example) bq=test_fn(value)
> Alternatively:
> {code:xml}
> <valueSourceParser name="test_fn" 
> class="org.apache.solr.search.xjoin.XJoinValueSourceParser">
>   <str name="attribute">value</str>
>   <double name="defaultValue">1.0</double>
> </valueSourceParser>
> {code}
> with corresponding query string parameter (for example) bq=test_fn(join_test)
> h3. Mapping between attributes and Java methods
> Java method names are converted into attribute (field) names by stripping the 
> initial "get" or "is" and converting the remainder from CamelCase to 
> lowercase-with-underscores, and vice versa. For example, getScore() converts 
> to "score" and getFooBar() converts to "foo_bar", and vice versa.
> The field list parameter of XJoinSearchComponent (fl) can be given as *, in 
> which case all methods beginning 'get' or 'is' are converted into fields in 
> the SOLR result for the document.
> h2. Putting it together - the SOLR query URL
> Here is an example SOLR query URL to perform an xjoin:
> {noformat}
> http://solrserver:8983/solr/collection1/xjoin?defType=edismax&q=*:*&xjoin_test.external.q=foobar&fl=id,score&fq={!xjoin}xjoin_test&bf=test_fn(value)
> {noformat}
> This might result in the following SOLR response:
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>   <lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">346</int>
>     <lst name="params">
>       ..
>     </lst>
>   </lst>
>   <result name="response" numFound="2" start="0" maxScore="58.60105">
>     <doc>
>       <str name="id">document1</str>
>       <float name="score">58.60105</float>
>     </doc>
>     <doc>
>       <str name="id">document2</str>
>       <float name="score">14.260552</float>
>     </doc>
>   </result>
>   <lst name="xjoin_test">
>     <int name="test_count">145</int>
>     <arr name="external">
>       <lst>
>         <str name="joinId">document1</str>
>         <lst name="doc">
>           <double name="value">7.4</double>
>         </lst>
>       </lst>
>       <lst name="external">
>         <str name="joinId">document2</str>
>         <lst name="doc">
>           <double name="value">2.3</double>
>         </lst>
>       </lst>
>     </arr>
>   </lst>
> </response>
> {code}
> Notes:
> * The actual 'join' is specified by the fq parameter. See XJoinQParserPlugin 
> above.
> * The function test_fn is used in the bf score-boost function. Since the 
> argument is value2, that attribute of the external results is used as the 
> score boost.
> h2. Many-to-many joins
> XJoin supports many-to-many joins in the following two ways.
> h3. Joining against a multi-valued field
> The SOLR field used as the join field may be multi-valued. External join 
> values will match every SOLR document with at least one matching value in the 
> join field. As usual, for every SOLR document in the results set, matching 
> external results are appended. In this case, this includes matching external 
> results with join id values for every value from the multi-valued field. 
> Therefore, there may be many more external results included than the number 
> of SOLR results.
> h3. Many external results with the same join id
> The case of many external results having the same join id is supported by 
> returning a Java Iterable from the implementation of 
> XJoinResults.getResult(joinIdStr). In this case, one <lst name="doc"> is 
> added to the corresponding <lst name="external"> per element in the iterable. 
> For the XJoinValueSourceParser, the maximum value is taken from the set of 
> possible values.
> h2. Joining results from multiple external sources
> There are (at least) 3 different ways XJoin can be used in conjunction with 
> other SOLR features to combine results from more than one external source.
> h3. Multiple filter queries
> Multiple filter queries are ANDed together by SOLR, so if this is the desired 
> combination for external result join ids, this is a simple approach. (Note 
> the implications for filter caching.) In this case, the external join fields 
> do not have to be the same.
> For example (assuming two configured XJoin components, xjoin_test and 
> xjoin_other):
> {noformat}
> http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq={!xjoin}xjoin_test&fq={!xjoin}xjoin_other
> {noformat}
> h3. Nested queries in the standard SOLR Query Parser
> The nested query syntax of the standard SOLR query parser (see 
> https://wiki.apache.org/solr/SolrQuerySyntax) can be used for more 
> complicated combinations, allowing for "should", "must" etc. Lucene queries 
> to be built from external join id sets. The external join fields do not have 
> to be the same.
> For example (again, assuming two configured XJoin components, xjoin_test and 
> xjoin_other):
> {noformat}
> http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq=_query_:"{!xjoin}xjoin_test";
>  -_query_:"{!xjoin}xjoin_other"
> {noformat}
> h3. Boolean expressions with the XJoin Query Parser
> To combine external join id sets directly using a Boolean expression, one can 
> use the XJoinQParserPlugin as detailed above. This allows arbitrary Boolean 
> expressions using the operators AND, OR, XOR and AND NOT.
> For example (again, assuming two configured XJoin components, xjoin_test and 
> xjoin_other):
> {noformat}
> http://solrserver:8983/solr/collection1/xjoin?q=*:*&xjoin_test.external.q=foobar&xjoin_other.external.q=barfoo&fq={!xjoin}xjoin_test
>  XOR xjoin_other
> {noformat}
> h3. The SimpleXJoinResultsFactory implementation
> The XJoin plugins accept java.util.Map returned from the results factory, 
> both for the XJoinResults implementation and for the individual results 
> objects themselves. This fact is made use of by the in-built 
> SimpleXJoinResultsFactory, which is an implementation of XJoinResultsFactory 
> that connects to a URL to collect results in XML or JSON format, and uses 
> XPath/JsonPaths to extract field values. This can often be used instead of 
> writing custom Java code.
> The SimpleXJoinResultsFactory takes the following initialisation parameters:
> * type - either XML or JSON
> * rootUrl - the URL to connect to for external results (can be file:// for 
> testing)
> * globalFieldPaths - a list of XPaths or JsonPaths which are used to extract 
> 'global' values (not individual result values)
> * joinIdPath - an XPath or JsonPath that should return an array of join ids 
> extracted from the results
> * joinIdToken - a token used in resultFieldPaths that will be substituted 
> with each join id, usually the default 'JOINID' will suffice
> * resultFieldPaths - a list of XPaths or JsonPaths which are used to extract 
> result values
> Example solrconfig.xml snippet:
> {code}
>   <searchComponent name="xjoin" 
> class="org.apache.solr.search.xjoin.XJoinSearchComponent">
>     <str 
> name="factoryClass">org.apache.solr.search.xjoin.simple.SimpleXJoinResultsFactory</str>
>     <str name="joinField">id</str>
>     <lst name="external">
>       <str name="type">JSON</str>
>       <str name="rootUrl">http://myserver/endpoint</str>
>       <lst name="globalFieldPaths">
>         <str name="count">$.length()</str>
>       </lst>
>       <str name="joinIdPath">$[*].id</str>
>       <lst name="resultFieldPaths">
>         <str name="field">$[?(@.id == 'JOINID')].field</str>
>         <str name="value">$[?(@.id == 'JOINID')].value</str>
>       </lst>
>     </lst>
>   </searchComponent>
> {code}
> Any external SolrParams are turned into URL query string parameters, so for 
> example, including "xjoin.external.q=foo" in the SOLR URL results in the 
> XJoin component making a request to "http://myserver/endpoint?q=foo";.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to