[
https://issues.apache.org/jira/browse/SOLR-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177410#comment-16177410
]
Varun Thacker edited comment on SOLR-11391 at 9/23/17 8:32 AM:
---------------------------------------------------------------
Very quick benchmarks on my local machine with master
I indexed 3M documents. The cardinality of VIN_s/VIN_i is 1.5M ( 2 values for
each unique VIN )
{code:title=3M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}*:* : 2 seconds
{!graph from=VIN_s to=VIN_s cache=false}*:* : 4 seconds
{code}
{code:title=2M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 1.8 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 4 seconds
{code}
{code:title=1M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 1.4 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 2.9 seconds
{code}
{code:title=500k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2
{code}
{code:title=250k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .9seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .6seconds
{code}
{code:title=10k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .7seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .03
{code}
So we might want to keep a threshold and use the appropriate algorithm.
was (Author: varunthacker):
Very quick benchmarks on my local machine
In indexed 3M documents . The cardinality of VIN_s is 1.5M
3M matches
{!join from=VIN_s to=VIN_s cache=false}*:* : 2 seconds
{!graph from=VIN_s to=VIN_s cache=false}*:* : 4 seconds
2M matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 1.8 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 4 seconds
1M matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 1.4 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 2.9 seconds
500k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2
250k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .9seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .6seconds
10k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .7seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .03
So we might want to keep a threshold and use the appropriate algorithm.
> JoinQParser for non point fields should use the GraphTermsCollector
> --------------------------------------------------------------------
>
> Key: SOLR-11391
> URL: https://issues.apache.org/jira/browse/SOLR-11391
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
> Attachments: SOLR-11391.patch
>
>
> The Join Query Parser uses the GraphPointsCollector for point fields.
> For non point fields if we use the GraphTermsCollector instead of the current
> algorithm I am seeing quite a bit of performance gains.
> I'm going to attach a quick patch which I cooked up , making sure TestJoin
> and TestCloudJSONFacetJoinDomain passed.
> More tests, benchmarking and code cleanup to follow
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]