Stuart Bertram created SOLR-9550: ------------------------------------ Summary: innerJoin can succeed with bad sorting Key: SOLR-9550 URL: https://issues.apache.org/jira/browse/SOLR-9550 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 6.1 Environment: CentOS 6.8, OpenJDK 1.8 Reporter: Stuart Bertram
The innerJoin streaming function requires that both streams are ordered by the correct keys for joining. In some situations, you can make a mistake and use an incorrect sort order but get a successful (but incorrect) return. Example: * Collection "UserPosts" has columns: ID, ByUserID * Collection "User" has columns: ID, Username, Registered, … * Streaming query {{gatherNodes(User, gatherNodes(UserPosts, walk="42 69->ID", gather="ByUserID"), walk="node->ID", gather="ID")}} returns the IDs of users who made posts 42 and 69, but we want the full user details * Streaming query {{innerJoin(sort(gatherNodes(User, gatherNodes(UserPosts, walk="42 69->ID", gather="ByUserID"), walk="node->ID", gather="ID"), by="ID asc"), search(User,qt="/export",q="*:*",fl="ID, Username, Registered, …", sort="ID asc"), on="node=ID")}} (Note the {{sort(…, by="ID")}}, because we're gathering the ID field, instead of {{sort(…, by="node")}}, because the gathered nodes return a tuple with the gathered ID in the "node" field) (Note: This example is simplified, so while there may be a better way to perform this specific query, the concept and the underlying issue remains) Expected result: Solr throws a (useful) exception saying that the sort orders do not match the join (because the first stream is sorted by ID, but the join is *node*=ID), as it does if the sort() call wasn't included. Actual result: Solr believes the queries are correctly sorted and returns each node from the first set joined with one set of values chosen from the second stream (each row is joined to the *same* row), so the returned ID and node values do not match, despite them being used in the join equality. This seems like a simple mistake to make at first, as I was gathering IDs and so automatically tried to sort by ID, but should have sorted by node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org