[
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177046#comment-15177046
]
Otis Gospodnetic commented on SOLR-6568:
----------------------------------------
Hey [~joel.bernstein], has this been superseded by other joins?
> Join Discovery Contrib
> ----------------------
>
> Key: SOLR-6568
> URL: https://issues.apache.org/jira/browse/SOLR-6568
> Project: Solr
> Issue Type: New Feature
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Priority: Minor
> Fix For: 5.0
>
>
> This contribution was commissioned by the *NCBI* (National Center for
> Biotechnology Information).
> The Join Discovery Contrib is a set of Solr plugins that support large scale
> joins and "join facets" between Solr cores.
> There are two different Join implementations included in this contribution.
> Both implementations are designed to work with integer join keys. It is very
> common in large BioInformatic and Genomic databases to use integer primary
> and foreign keys. Integer keys allow Bioinformatic and Genomic search engines
> and discovery tools to perform complex operations on large data sets very
> efficiently.
> The Join Discovery Contrib provides features that will be applicable to
> anyone working with the freely available databases from the NCBI and likely a
> large number of other BioInformatic and Genomic databases. These features are
> not specific though to Bioinformatics and Genomics, they can be used in any
> datasets where integer keys are used to define the primary and foreign keys.
> What is included in this contrib:
> 1) A new JoinComponent. This component is used instead of the standard
> QueryComponent. It facilitates very large scale relational joins between two
> Solr indexes (cores). The join algorithm used in this component is known as a
> *parallel partitioned merge join*. This is an algorithm which partitions the
> results from both sides of the join and then sorts and merges the partitions
> in parallel.
> Below are some of it's features:
> * Sub-second performance on very large joins. The parallel join algorithm is
> capable of sub-second performance on joins with tens of millions of records
> on both sides of the join.
> * The JoinComponent returns "tuples" with fields from both sides of the join.
> The initial release returns the primary keys from both sides of the join and
> the join key.
> * The tuples also include, and are ranked by, a combined score from both
> sides of the join.
> * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This
> makes it possible to join an entire index with a sub-set of another index
> with sub-second performance.
> * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast
> many-to-many joins make it possible to join between indexes on multi-value
> fields.
> 2) A new JoinFacetComponent. This component provides facets for both indexes
> involved in the join.
> 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on
> bitsets that supports infinite levels of nesting. It can be used as a filter
> query in combination with the JoinComponent or with the standard query
> component.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]