[ 
https://issues.apache.org/jira/browse/SOLR-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177410#comment-16177410
 ] 

Varun Thacker edited comment on SOLR-11391 at 9/23/17 8:32 AM:
---------------------------------------------------------------

Very quick benchmarks on my local machine with master

I indexed 3M documents. The cardinality of VIN_s/VIN_i is 1.5M ( 2 values for 
each unique VIN )

{code:title=3M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}*:*  : 2 seconds
{!graph from=VIN_s to=VIN_s cache=false}*:*  : 4 seconds
{code}

{code:title=2M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  2000000]  : 1.8 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  2000000] : 4 seconds
{code}

{code:title=1M matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  1000000]  : 1.4 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  1000000] : 2.9 seconds
{code}

{code:title=500k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  500000]  : 1 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2
{code}

{code:title=250k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  250000]  : .9seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] :  .6seconds
{code}


{code:title=10k matches|borderStyle=solid}
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  10000]  : .7seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] :  .03
{code}

So we might want to keep a threshold and use the appropriate algorithm. 


was (Author: varunthacker):
Very quick benchmarks on my local machine 

In indexed 3M documents . The cardinality of VIN_s is 1.5M 

3M matches 
{!join from=VIN_s to=VIN_s cache=false}*:*  : 2 seconds
{!graph from=VIN_s to=VIN_s cache=false}*:*  : 4 seconds

2M matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  2000000]  : 1.8 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  2000000] : 4 seconds


1M matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  1000000]  : 1.4 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  1000000] : 2.9 seconds


500k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  500000]  : 1 seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2


250k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  250000]  : .9seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] :  .6seconds


10k matches
{!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO  10000]  : .7seconds
{!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] :  .03


So we might want to keep a threshold and use the appropriate algorithm. 

> JoinQParser for non point fields should use the GraphTermsCollector 
> --------------------------------------------------------------------
>
>                 Key: SOLR-11391
>                 URL: https://issues.apache.org/jira/browse/SOLR-11391
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>         Attachments: SOLR-11391.patch
>
>
> The Join Query Parser uses the GraphPointsCollector for point fields. 
> For non point fields if we use the GraphTermsCollector instead of the current 
> algorithm I am seeing quite a bit of performance gains.
> I'm going to attach a quick patch which I cooked up , making sure TestJoin 
> and TestCloudJSONFacetJoinDomain passed. 
> More tests, benchmarking and code cleanup to follow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to