[
https://issues.apache.org/jira/browse/LUCENE-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798879#comment-13798879
]
Erik Groeneveld edited comment on LUCENE-5291 at 10/19/13 4:53 PM:
-------------------------------------------------------------------
This patch does not patch anything in Lucene. Its just three classes that apply
Lucene. They live in src/org/meresco/lucene. If adopted, they could be moved
to org/apache/lucene.
I used "diff -urN" instead of svn diff, since the code is in git, not in
subversion.
The split between KeyCollector and CachingKeyCollector is not essential. It
only shows how simple the idea is, and how caching complicates things.
Intended usage.
EDIT
Indexing:
Create NumericDocValues for the fields you want to join on. We translate URIs
to ords using DirectoryTaxonomyWriter, but that's just one way of doing it. As
long as the number is small and monotonically increasing.
Searching:
You first use CachingKeyCollector to collect keys from one index. Then you use
CachingKeyCollector.getFilter() to filter keys in another index. I went to
some lengths to add documentation to the code, so I hope it is clear how it
works.
was (Author: [email protected]):
This patch does not patch anything in Lucene. Its just three classes that apply
Lucene. They live in src/org/meresco/lucene. If adopted, they could be moved
to org/apache/lucene.
I used "diff -urN" instead of svn diff, since the code is in git, not in
subversion.
The split between KeyCollector and CachingKeyCollector is not essential. It
only shows how simple the idea is, and how caching complicates things.
Intended usage.
You first use CachingKeyCollector to collect keys from one index. Then you use
CachingKeyCollector.getFilter() to filter keys in another index. I went to
some lengths to add documentation to the code, so I hope it is clear how it
works.
> Faster Query-Time Join
> ----------------------
>
> Key: LUCENE-5291
> URL: https://issues.apache.org/jira/browse/LUCENE-5291
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index, core/search
> Affects Versions: 4.5
> Reporter: Erik Groeneveld
> Priority: Minor
> Labels: join, query
> Attachments: LUCENE-5291.patch
>
>
> The current implementation of query-time join could be complemented with a
> much faster one, provided some choices can be made about what to join on.
> Since join is really a database concept, we found it quite natural to
> restrict the keys to be integers and be single valued.
> We found that if it is possible to use integers keys, and having single
> valued key fields, the speed of join can be improved 50 fold. Proper caching
> again speeds up about 20 times.
> I'd like to contribute our code if you agree that it is a useful
> contribution. That probably depends on what you think of the choices we made
> about the keys, so that need to be discussed first?
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]