[jira] [Created] (SOLR-6568) Join Discovery Contrib

Joel Bernstein (JIRA) Fri, 26 Sep 2014 08:16:18 -0700

Joel Bernstein created SOLR-6568:
------------------------------------

             Summary: Join Discovery Contrib
                 Key: SOLR-6568
                 URL: https://issues.apache.org/jira/browse/SOLR-6568
             Project: Solr
          Issue Type: New Feature
            Reporter: Joel Bernstein
            Priority: Minor
             Fix For: 5.0



This contribution was commissioned by the *NCBI* (National Center for 
Biotechnology Information). 

The Join Discovery Contrib is a set of Solr plugins that support large scale 
joins and "join facets" between Solr cores. 

There are two different Join implementations included in this contribution. 
Both implementations are designed to work with integer join keys. 
It is very common in large BioInformatic and Genomic databases to use integer 
primary and foreign keys. Integer keys allow Bioinformatic and Genomic search 
engines and discovery tools 
to perform complex operations on large data sets very efficiently. The Join 
Discovery Contrib provides features that will be applicable to anyone working 
with the freely available databases from 
the NCBI and likely a large number of other BioInformatic and Genomic 
databases. These features are not specific though to Bioinformatics and 
Genomics, they can be used in any datasets where integer
keys are used to define the primary and foreign keys.

What is is included in this contrib:

1) A new JoinComponent. This component is used instead of the standard 
QueryComponent. It facilitates very large scale relational joins between two 
Solr indexes (cores). The join algorithm used in this 
component is known as a *parallel partitioned merge join*. This is an algorithm 
which partitions the results from both sides of the join and then sorts and 
merges the partitions in parallel. 

 Below are some of it's features:

* Sub-second performance on very large joins. The parallel join algorithm is 
capable of sub-second performance on joins with tens of millions of records on 
both sides of the join.
* The JoinComponent returns "tuples" with fields from both sides of the join. 
The initial release returns the primary keys from both sides of the join and 
the join key. 
* The tuples also include, and are ranked by, a combined score from both sides 
of the join.
* Special purpose memory-mapped on-disk indexes to support *:* joins. This 
makes it possible to join an entire index with a sub-set of another index with 
sub-second performance. 
* Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
many-to-many joins make it possible to join between indexes on multi-value 
fields. 

2) A new JoinFacetComponent. This component provides facets for both indexes 
involved in the join. 

3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
bitsets that supports infinite levels of nesting. It can be used as a filter 
query in combination with the JoinComponent or with the standard query
component. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-6568) Join Discovery Contrib

Reply via email to