[ https://issues.apache.org/jira/browse/SOLR-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177410#comment-16177410 ]
Varun Thacker edited comment on SOLR-11391 at 9/23/17 8:32 AM: --------------------------------------------------------------- Very quick benchmarks on my local machine with master I indexed 3M documents. The cardinality of VIN_s/VIN_i is 1.5M ( 2 values for each unique VIN ) {code:title=3M matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}*:* : 2 seconds {!graph from=VIN_s to=VIN_s cache=false}*:* : 4 seconds {code} {code:title=2M matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 1.8 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 4 seconds {code} {code:title=1M matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 1.4 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 2.9 seconds {code} {code:title=500k matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2 {code} {code:title=250k matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .9seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .6seconds {code} {code:title=10k matches|borderStyle=solid} {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .7seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .03 {code} So we might want to keep a threshold and use the appropriate algorithm. was (Author: varunthacker): Very quick benchmarks on my local machine In indexed 3M documents . The cardinality of VIN_s is 1.5M 3M matches {!join from=VIN_s to=VIN_s cache=false}*:* : 2 seconds {!graph from=VIN_s to=VIN_s cache=false}*:* : 4 seconds 2M matches {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 1.8 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 2000000] : 4 seconds 1M matches {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 1.4 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 1000000] : 2.9 seconds 500k matches {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1 seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 500000] : 1.2 250k matches {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .9seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 250000] : .6seconds 10k matches {!join from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .7seconds {!graph from=VIN_s to=VIN_s cache=false}VIN_i:[0 TO 10000] : .03 So we might want to keep a threshold and use the appropriate algorithm. > JoinQParser for non point fields should use the GraphTermsCollector > -------------------------------------------------------------------- > > Key: SOLR-11391 > URL: https://issues.apache.org/jira/browse/SOLR-11391 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > Attachments: SOLR-11391.patch > > > The Join Query Parser uses the GraphPointsCollector for point fields. > For non point fields if we use the GraphTermsCollector instead of the current > algorithm I am seeing quite a bit of performance gains. > I'm going to attach a quick patch which I cooked up , making sure TestJoin > and TestCloudJSONFacetJoinDomain passed. > More tests, benchmarking and code cleanup to follow -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org