[jira] [Comment Edited] (LUCENE-5291) Faster Query-Time Join

Erik Groeneveld (JIRA) Sat, 19 Oct 2013 09:54:37 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798879#comment-13798879
 ]


Erik Groeneveld edited comment on LUCENE-5291 at 10/19/13 4:53 PM:
-------------------------------------------------------------------

This patch does not patch anything in Lucene. Its just three classes that apply 
Lucene.  They live in src/org/meresco/lucene. If adopted, they could be moved 
to org/apache/lucene.

I used "diff -urN" instead of svn diff, since the code is in git, not in 
subversion.

The split between KeyCollector and CachingKeyCollector is not essential.  It 
only shows how simple the idea is, and how caching complicates things.

Intended usage.

EDIT

Indexing:
Create NumericDocValues for the fields you want to join on.  We translate URIs 
to ords using DirectoryTaxonomyWriter, but that's just one way of doing it. As 
long as the number is small and monotonically increasing.

Searching:
You first use CachingKeyCollector to collect keys from one index. Then you use 
CachingKeyCollector.getFilter() to filter keys in another index.  I went to 
some lengths to add documentation to the code, so I hope it is clear how it 
works.


was (Author: [email protected]):
This patch does not patch anything in Lucene. Its just three classes that apply 
Lucene.  They live in src/org/meresco/lucene. If adopted, they could be moved 
to org/apache/lucene.

I used "diff -urN" instead of svn diff, since the code is in git, not in 
subversion.

The split between KeyCollector and CachingKeyCollector is not essential.  It 
only shows how simple the idea is, and how caching complicates things.

Intended usage.

You first use CachingKeyCollector to collect keys from one index. Then you use 
CachingKeyCollector.getFilter() to filter keys in another index.  I went to 
some lengths to add documentation to the code, so I hope it is clear how it 
works.

> Faster Query-Time Join
> ----------------------
>
>                 Key: LUCENE-5291
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5291
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.5
>            Reporter: Erik Groeneveld
>            Priority: Minor
>              Labels: join, query
>         Attachments: LUCENE-5291.patch
>
>
> The current implementation of query-time join could be complemented with a 
> much faster one, provided some choices can be made about what to join on.
> Since join is really a database concept, we found it quite natural to 
> restrict the keys to be integers and be single valued. 
> We found that if it is possible to use integers keys, and having single 
> valued key fields, the speed of join can be improved 50 fold. Proper caching 
> again speeds up about 20 times.
> I'd like to contribute our code if you agree that it is a useful 
> contribution.  That probably depends on what you think of the choices we made 
> about the keys, so that need to be discussed first?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5291) Faster Query-Time Join

Reply via email to