[jira] [Commented] (SOLR-6568) Join Discovery Contrib

Otis Gospodnetic (JIRA) Wed, 02 Mar 2016 18:46:42 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177046#comment-15177046
 ]


Otis Gospodnetic commented on SOLR-6568:
----------------------------------------

Hey [~joel.bernstein], has this been superseded by other joins?

> Join Discovery Contrib
> ----------------------
>
>                 Key: SOLR-6568
>                 URL: https://issues.apache.org/jira/browse/SOLR-6568
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>
> This contribution was commissioned by the *NCBI* (National Center for 
> Biotechnology Information). 
> The Join Discovery Contrib is a set of Solr plugins that support large scale 
> joins and "join facets" between Solr cores. 
> There are two different Join implementations included in this contribution. 
> Both implementations are designed to work with integer join keys. It is very 
> common in large BioInformatic and Genomic databases to use integer primary 
> and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
> and discovery tools to perform complex operations on large data sets very 
> efficiently. 
> The Join Discovery Contrib provides features that will be applicable to 
> anyone working with the freely available databases from the NCBI and likely a 
> large number of other BioInformatic and Genomic databases. These features are 
> not specific though to Bioinformatics and Genomics, they can be used in any 
> datasets where integer keys are used to define the primary and foreign keys.
> What is included in this contrib:
> 1) A new JoinComponent. This component is used instead of the standard 
> QueryComponent. It facilitates very large scale relational joins between two 
> Solr indexes (cores). The join algorithm used in this component is known as a 
> *parallel partitioned merge join*. This is an algorithm which partitions the 
> results from both sides of the join and then sorts and merges the partitions 
> in parallel. 
>  Below are some of it's features:
> * Sub-second performance on very large joins. The parallel join algorithm is 
> capable of sub-second performance on joins with tens of millions of records 
> on both sides of the join.
> * The JoinComponent returns "tuples" with fields from both sides of the join. 
> The initial release returns the primary keys from both sides of the join and 
> the join key. 
> * The tuples also include, and are ranked by, a combined score from both 
> sides of the join.
> * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
> makes it possible to join an entire index with a sub-set of another index 
> with sub-second performance. 
> * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
> many-to-many joins make it possible to join between indexes on multi-value 
> fields. 
> 2) A new JoinFacetComponent. This component provides facets for both indexes 
> involved in the join. 
> 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
> bitsets that supports infinite levels of nesting. It can be used as a filter 
> query in combination with the JoinComponent or with the standard query 
> component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6568) Join Discovery Contrib

Reply via email to